diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
| commit | dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch) | |
| tree | 418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/016/virtual | |
| parent | 4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff) | |
| download | emulator-bug-study-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz emulator-bug-study-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip | |
restructure results
Diffstat (limited to 'results/classifier/016/virtual')
| -rw-r--r-- | results/classifier/016/virtual/04472277 | 603 | ||||
| -rw-r--r-- | results/classifier/016/virtual/16201167 | 127 | ||||
| -rw-r--r-- | results/classifier/016/virtual/24190340 | 2083 | ||||
| -rw-r--r-- | results/classifier/016/virtual/24930826 | 60 | ||||
| -rw-r--r-- | results/classifier/016/virtual/25892827 | 1104 | ||||
| -rw-r--r-- | results/classifier/016/virtual/35170175 | 548 | ||||
| -rw-r--r-- | results/classifier/016/virtual/36568044 | 4608 | ||||
| -rw-r--r-- | results/classifier/016/virtual/46572227 | 433 | ||||
| -rw-r--r-- | results/classifier/016/virtual/53568181 | 105 | ||||
| -rw-r--r-- | results/classifier/016/virtual/57231878 | 269 | ||||
| -rw-r--r-- | results/classifier/016/virtual/67821138 | 226 | ||||
| -rw-r--r-- | results/classifier/016/virtual/70021271 | 7475 | ||||
| -rw-r--r-- | results/classifier/016/virtual/70416488 | 1206 | ||||
| -rw-r--r-- | results/classifier/016/virtual/74466963 | 1905 | ||||
| -rw-r--r-- | results/classifier/016/virtual/79834768 | 436 |
15 files changed, 0 insertions, 21188 deletions
diff --git a/results/classifier/016/virtual/04472277 b/results/classifier/016/virtual/04472277 deleted file mode 100644 index 307fd76c..00000000 --- a/results/classifier/016/virtual/04472277 +++ /dev/null @@ -1,603 +0,0 @@ -virtual: 0.939 -KVM: 0.879 -x86: 0.774 -debug: 0.772 -files: 0.742 -hypervisor: 0.710 -user-level: 0.641 -operating system: 0.244 -boot: 0.068 -kernel: 0.045 -PID: 0.025 -performance: 0.024 -TCG: 0.022 -VMM: 0.019 -register: 0.018 -socket: 0.017 -semantic: 0.015 -device: 0.010 -risc-v: 0.010 -network: 0.007 -architecture: 0.006 -ppc: 0.006 -alpha: 0.005 -graphic: 0.003 -assembly: 0.003 -vnc: 0.003 -peripherals: 0.002 -permissions: 0.002 -i386: 0.001 -arm: 0.001 -mistranslation: 0.001 - -[BUG][KVM_SET_USER_MEMORY_REGION] KVM_SET_USER_MEMORY_REGION failed - -Hi all, -I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. -Is there any one know this? -The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log -``` -2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-14 10:09:18.198+0000: shutting down, reason=crashed -``` -The xml file -``` -root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml -<!-- -WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE -OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: - virsh edit instance-0000000e -or other application using the libvirt API. ---> -<domain type='kvm'> - <name>instance-0000000e</name> - <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> - <metadata> -  <nova:instance xmlns:nova=" -http://openstack.org/xmlns/libvirt/nova/1.1 -"> -   <nova:package version="25.1.0"/> -   <nova:name>provider-instance</nova:name> -   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> -   <nova:flavor name="cirros-os-dpu-test-1"> -    <nova:memory>64</nova:memory> -    <nova:disk>1</nova:disk> -    <nova:swap>0</nova:swap> -    <nova:ephemeral>0</nova:ephemeral> -    <nova:vcpus>1</nova:vcpus> -   </nova:flavor> -   <nova:owner> -    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> -    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> -   </nova:owner> -   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> -   <nova:ports> -    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> -     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> -    </nova:port> -   </nova:ports> -  </nova:instance> - </metadata> - <memory unit='KiB'>65536</memory> - <currentMemory unit='KiB'>65536</currentMemory> - <vcpu placement='static'>1</vcpu> - <sysinfo type='smbios'> -  <system> -   <entry name='manufacturer'>OpenStack Foundation</entry> -   <entry name='product'>OpenStack Nova</entry> -   <entry name='version'>25.1.0</entry> -   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='family'>Virtual Machine</entry> -  </system> - </sysinfo> - <os> -  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> -  <boot dev='hd'/> -  <smbios mode='sysinfo'/> - </os> - <features> -  <acpi/> -  <apic/> -  <vmcoreinfo state='on'/> - </features> - <cpu mode='host-model' check='partial'> -  <topology sockets='1' dies='1' cores='1' threads='1'/> - </cpu> - <clock offset='utc'> -  <timer name='pit' tickpolicy='delay'/> -  <timer name='rtc' tickpolicy='catchup'/> -  <timer name='hpet' present='no'/> - </clock> - <on_poweroff>destroy</on_poweroff> - <on_reboot>restart</on_reboot> - <on_crash>destroy</on_crash> - <devices> -  <emulator>/usr/bin/qemu-system-x86_64</emulator> -  <disk type='file' device='disk'> -   <driver name='qemu' type='qcow2' cache='none'/> -   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> -   <target dev='vda' bus='virtio'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> -  </disk> -  <controller type='usb' index='0' model='piix3-uhci'> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> -  </controller> -  <controller type='pci' index='0' model='pci-root'/> -  <interface type='hostdev' managed='yes'> -   <mac address='fa:16:3e:aa:d9:23'/> -   <source> -    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> -  </interface> -  <serial type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='isa-serial' port='0'> -    <model name='isa-serial'/> -   </target> -  </serial> -  <console type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='serial' port='0'/> -  </console> -  <input type='tablet' bus='usb'> -   <address type='usb' bus='0' port='1'/> -  </input> -  <input type='mouse' bus='ps2'/> -  <input type='keyboard' bus='ps2'/> -  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> -   <listen type='address' address='0.0.0.0'/> -  </graphics> -  <audio id='1' type='none'/> -  <video> -   <model type='virtio' heads='1' primary='yes'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> -  </video> -  <hostdev mode='subsystem' type='pci' managed='yes'> -   <source> -    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> -  </hostdev> -  <memballoon model='virtio'> -   <stats period='10'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> -  </memballoon> -  <rng model='virtio'> -   <backend model='random'>/dev/urandom</backend> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> -  </rng> - </devices> -</domain> -``` ----- -Simon Jones - -This is happened in ubuntu22.04. -QEMU is install by apt like this: -apt install -y qemu qemu-kvm qemu-system -and QEMU version is 6.2.0 ----- -Simon Jones -Simon Jones < -batmanustc@gmail.com -> äº2023å¹´3æ21æ¥å¨äº 08:40åéï¼ -Hi all, -I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. -Is there any one know this? -The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log -``` -2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-14 10:09:18.198+0000: shutting down, reason=crashed -``` -The xml file -``` -root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml -<!-- -WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE -OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: - virsh edit instance-0000000e -or other application using the libvirt API. ---> -<domain type='kvm'> - <name>instance-0000000e</name> - <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> - <metadata> -  <nova:instance xmlns:nova=" -http://openstack.org/xmlns/libvirt/nova/1.1 -"> -   <nova:package version="25.1.0"/> -   <nova:name>provider-instance</nova:name> -   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> -   <nova:flavor name="cirros-os-dpu-test-1"> -    <nova:memory>64</nova:memory> -    <nova:disk>1</nova:disk> -    <nova:swap>0</nova:swap> -    <nova:ephemeral>0</nova:ephemeral> -    <nova:vcpus>1</nova:vcpus> -   </nova:flavor> -   <nova:owner> -    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> -    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> -   </nova:owner> -   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> -   <nova:ports> -    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> -     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> -    </nova:port> -   </nova:ports> -  </nova:instance> - </metadata> - <memory unit='KiB'>65536</memory> - <currentMemory unit='KiB'>65536</currentMemory> - <vcpu placement='static'>1</vcpu> - <sysinfo type='smbios'> -  <system> -   <entry name='manufacturer'>OpenStack Foundation</entry> -   <entry name='product'>OpenStack Nova</entry> -   <entry name='version'>25.1.0</entry> -   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='family'>Virtual Machine</entry> -  </system> - </sysinfo> - <os> -  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> -  <boot dev='hd'/> -  <smbios mode='sysinfo'/> - </os> - <features> -  <acpi/> -  <apic/> -  <vmcoreinfo state='on'/> - </features> - <cpu mode='host-model' check='partial'> -  <topology sockets='1' dies='1' cores='1' threads='1'/> - </cpu> - <clock offset='utc'> -  <timer name='pit' tickpolicy='delay'/> -  <timer name='rtc' tickpolicy='catchup'/> -  <timer name='hpet' present='no'/> - </clock> - <on_poweroff>destroy</on_poweroff> - <on_reboot>restart</on_reboot> - <on_crash>destroy</on_crash> - <devices> -  <emulator>/usr/bin/qemu-system-x86_64</emulator> -  <disk type='file' device='disk'> -   <driver name='qemu' type='qcow2' cache='none'/> -   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> -   <target dev='vda' bus='virtio'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> -  </disk> -  <controller type='usb' index='0' model='piix3-uhci'> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> -  </controller> -  <controller type='pci' index='0' model='pci-root'/> -  <interface type='hostdev' managed='yes'> -   <mac address='fa:16:3e:aa:d9:23'/> -   <source> -    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> -  </interface> -  <serial type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='isa-serial' port='0'> -    <model name='isa-serial'/> -   </target> -  </serial> -  <console type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='serial' port='0'/> -  </console> -  <input type='tablet' bus='usb'> -   <address type='usb' bus='0' port='1'/> -  </input> -  <input type='mouse' bus='ps2'/> -  <input type='keyboard' bus='ps2'/> -  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> -   <listen type='address' address='0.0.0.0'/> -  </graphics> -  <audio id='1' type='none'/> -  <video> -   <model type='virtio' heads='1' primary='yes'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> -  </video> -  <hostdev mode='subsystem' type='pci' managed='yes'> -   <source> -    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> -  </hostdev> -  <memballoon model='virtio'> -   <stats period='10'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> -  </memballoon> -  <rng model='virtio'> -   <backend model='random'>/dev/urandom</backend> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> -  </rng> - </devices> -</domain> -``` ----- -Simon Jones - -This is full ERROR log -2023-03-23 08:00:52.362+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt < -christian.ehrhardt@canonical.com -> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2 -LC_ALL=C \ -PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ -HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e \ -XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.local/share \ -XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.cache \ -XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.config \ -/usr/bin/qemu-system-x86_64 \ --name guest=instance-0000000e,debug-threads=on \ --S \ --object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-4-instance-0000000e/master-key.aes"}' \ --machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \ --accel kvm \ --cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \ --m 64 \ --object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \ --overcommit mem-lock=off \ --smp 1,sockets=1,dies=1,cores=1,threads=1 \ --uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \ --smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \ --no-user-config \ --nodefaults \ --chardev socket,id=charmonitor,fd=33,server=on,wait=off \ --mon chardev=charmonitor,id=monitor,mode=control \ --rtc base=utc,driftfix=slew \ --global kvm-pit.lost_tick_policy=delay \ --no-hpet \ --no-shutdown \ --boot strict=on \ --device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ --device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ --add-fd set=1,fd=34 \ --chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \ --device isa-serial,chardev=charserial0,id=serial0 \ --device usb-tablet,id=input0,bus=usb.0,port=1 \ --audiodev '{"id":"audio1","driver":"none"}' \ --vnc -0.0.0.0:0 -,audiodev=audio1 \ --device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \ --device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \ --device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \ --device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ --object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ --device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \ --device vmcoreinfo \ --sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ --msg timestamp=on -char device redirected to /dev/pts/3 (label charserial0) -2023-03-23T08:00:53.728550Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-23 08:00:54.201+0000: shutting down, reason=crashed -2023-03-23 08:54:43.468+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt < -christian.ehrhardt@canonical.com -> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2 -LC_ALL=C \ -PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ -HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e \ -XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.local/share \ -XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.cache \ -XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.config \ -/usr/bin/qemu-system-x86_64 \ --name guest=instance-0000000e,debug-threads=on \ --S \ --object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-5-instance-0000000e/master-key.aes"}' \ --machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \ --accel kvm \ --cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \ --m 64 \ --object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \ --overcommit mem-lock=off \ --smp 1,sockets=1,dies=1,cores=1,threads=1 \ --uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \ --smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \ --no-user-config \ --nodefaults \ --chardev socket,id=charmonitor,fd=33,server=on,wait=off \ --mon chardev=charmonitor,id=monitor,mode=control \ --rtc base=utc,driftfix=slew \ --global kvm-pit.lost_tick_policy=delay \ --no-hpet \ --no-shutdown \ --boot strict=on \ --device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ --blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ --blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ --device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ --add-fd set=1,fd=34 \ --chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \ --device isa-serial,chardev=charserial0,id=serial0 \ --device usb-tablet,id=input0,bus=usb.0,port=1 \ --audiodev '{"id":"audio1","driver":"none"}' \ --vnc -0.0.0.0:0 -,audiodev=audio1 \ --device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \ --device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \ --device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \ --device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ --object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ --device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \ --device vmcoreinfo \ --sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ --msg timestamp=on -char device redirected to /dev/pts/3 (label charserial0) -2023-03-23T08:54:44.755039Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-23 08:54:45.230+0000: shutting down, reason=crashed ----- -Simon Jones -Simon Jones < -batmanustc@gmail.com -> äº2023å¹´3æ23æ¥å¨å 05:49åéï¼ -This is happened in ubuntu22.04. -QEMU is install by apt like this: -apt install -y qemu qemu-kvm qemu-system -and QEMU version is 6.2.0 ----- -Simon Jones -Simon Jones < -batmanustc@gmail.com -> äº2023å¹´3æ21æ¥å¨äº 08:40åéï¼ -Hi all, -I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. -Is there any one know this? -The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log -``` -2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument -kvm_set_phys_mem: error registering slot: Invalid argument -2023-03-14 10:09:18.198+0000: shutting down, reason=crashed -``` -The xml file -``` -root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml -<!-- -WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE -OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: - virsh edit instance-0000000e -or other application using the libvirt API. ---> -<domain type='kvm'> - <name>instance-0000000e</name> - <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> - <metadata> -  <nova:instance xmlns:nova=" -http://openstack.org/xmlns/libvirt/nova/1.1 -"> -   <nova:package version="25.1.0"/> -   <nova:name>provider-instance</nova:name> -   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> -   <nova:flavor name="cirros-os-dpu-test-1"> -    <nova:memory>64</nova:memory> -    <nova:disk>1</nova:disk> -    <nova:swap>0</nova:swap> -    <nova:ephemeral>0</nova:ephemeral> -    <nova:vcpus>1</nova:vcpus> -   </nova:flavor> -   <nova:owner> -    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> -    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> -   </nova:owner> -   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> -   <nova:ports> -    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> -     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> -    </nova:port> -   </nova:ports> -  </nova:instance> - </metadata> - <memory unit='KiB'>65536</memory> - <currentMemory unit='KiB'>65536</currentMemory> - <vcpu placement='static'>1</vcpu> - <sysinfo type='smbios'> -  <system> -   <entry name='manufacturer'>OpenStack Foundation</entry> -   <entry name='product'>OpenStack Nova</entry> -   <entry name='version'>25.1.0</entry> -   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> -   <entry name='family'>Virtual Machine</entry> -  </system> - </sysinfo> - <os> -  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> -  <boot dev='hd'/> -  <smbios mode='sysinfo'/> - </os> - <features> -  <acpi/> -  <apic/> -  <vmcoreinfo state='on'/> - </features> - <cpu mode='host-model' check='partial'> -  <topology sockets='1' dies='1' cores='1' threads='1'/> - </cpu> - <clock offset='utc'> -  <timer name='pit' tickpolicy='delay'/> -  <timer name='rtc' tickpolicy='catchup'/> -  <timer name='hpet' present='no'/> - </clock> - <on_poweroff>destroy</on_poweroff> - <on_reboot>restart</on_reboot> - <on_crash>destroy</on_crash> - <devices> -  <emulator>/usr/bin/qemu-system-x86_64</emulator> -  <disk type='file' device='disk'> -   <driver name='qemu' type='qcow2' cache='none'/> -   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> -   <target dev='vda' bus='virtio'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> -  </disk> -  <controller type='usb' index='0' model='piix3-uhci'> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> -  </controller> -  <controller type='pci' index='0' model='pci-root'/> -  <interface type='hostdev' managed='yes'> -   <mac address='fa:16:3e:aa:d9:23'/> -   <source> -    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> -  </interface> -  <serial type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='isa-serial' port='0'> -    <model name='isa-serial'/> -   </target> -  </serial> -  <console type='pty'> -   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> -   <target type='serial' port='0'/> -  </console> -  <input type='tablet' bus='usb'> -   <address type='usb' bus='0' port='1'/> -  </input> -  <input type='mouse' bus='ps2'/> -  <input type='keyboard' bus='ps2'/> -  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> -   <listen type='address' address='0.0.0.0'/> -  </graphics> -  <audio id='1' type='none'/> -  <video> -   <model type='virtio' heads='1' primary='yes'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> -  </video> -  <hostdev mode='subsystem' type='pci' managed='yes'> -   <source> -    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> -   </source> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> -  </hostdev> -  <memballoon model='virtio'> -   <stats period='10'/> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> -  </memballoon> -  <rng model='virtio'> -   <backend model='random'>/dev/urandom</backend> -   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> -  </rng> - </devices> -</domain> -``` ----- -Simon Jones - diff --git a/results/classifier/016/virtual/16201167 b/results/classifier/016/virtual/16201167 deleted file mode 100644 index f1cb262a..00000000 --- a/results/classifier/016/virtual/16201167 +++ /dev/null @@ -1,127 +0,0 @@ -virtual: 0.917 -KVM: 0.915 -hypervisor: 0.764 -debug: 0.653 -kernel: 0.598 -operating system: 0.556 -assembly: 0.306 -performance: 0.135 -TCG: 0.109 -PID: 0.107 -files: 0.093 -x86: 0.074 -VMM: 0.045 -register: 0.033 -device: 0.025 -risc-v: 0.018 -user-level: 0.017 -i386: 0.016 -ppc: 0.015 -arm: 0.010 -semantic: 0.009 -architecture: 0.009 -boot: 0.006 -alpha: 0.006 -network: 0.004 -socket: 0.003 -vnc: 0.003 -graphic: 0.003 -peripherals: 0.002 -permissions: 0.001 -mistranslation: 0.000 - -[BUG] Qemu abort with error "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)" - -Hi list, - -When I did some tests in my virtual domain with live-attached virtio deivces, I -got a coredump file of Qemu. - -The error print from qemu is "kvm_mem_ioeventfd_add: error adding ioeventfd: -File exists (17)". -And the call trace in the coredump file displays as below: -#0 0x0000ffff89acecc8 in ?? () from /usr/lib64/libc.so.6 -#1 0x0000ffff89a8acbc in raise () from /usr/lib64/libc.so.6 -#2 0x0000ffff89a78d2c in abort () from /usr/lib64/libc.so.6 -#3 0x0000aaaabd7ccf1c in kvm_mem_ioeventfd_add (listener=<optimized out>, -section=<optimized out>, match_data=<optimized out>, data=<optimized out>, -e=<optimized out>) at ../accel/kvm/kvm-all.c:1607 -#4 0x0000aaaabd6e0304 in address_space_add_del_ioeventfds (fds_old_nb=164, -fds_old=0xffff5c80a1d0, fds_new_nb=160, fds_new=0xffff5c565080, -as=0xaaaabdfa8810 <address_space_memory>) - at ../softmmu/memory.c:795 -#5 address_space_update_ioeventfds (as=0xaaaabdfa8810 <address_space_memory>) -at ../softmmu/memory.c:856 -#6 0x0000aaaabd6e24d8 in memory_region_commit () at ../softmmu/memory.c:1113 -#7 0x0000aaaabd6e25c4 in memory_region_transaction_commit () at -../softmmu/memory.c:1144 -#8 0x0000aaaabd394eb4 in pci_bridge_update_mappings -(br=br@entry=0xaaaae755f7c0) at ../hw/pci/pci_bridge.c:248 -#9 0x0000aaaabd394f4c in pci_bridge_write_config (d=0xaaaae755f7c0, -address=44, val=<optimized out>, len=4) at ../hw/pci/pci_bridge.c:272 -#10 0x0000aaaabd39a928 in rp_write_config (d=0xaaaae755f7c0, address=44, -val=128, len=4) at ../hw/pci-bridge/pcie_root_port.c:39 -#11 0x0000aaaabd6df328 in memory_region_write_accessor (mr=0xaaaae63898d0, -addr=65580, value=<optimized out>, size=4, shift=<optimized out>, -mask=<optimized out>, attrs=...) at ../softmmu/memory.c:494 -#12 0x0000aaaabd6dcb6c in access_with_adjusted_size (addr=addr@entry=65580, -value=value@entry=0xffff817adc78, size=size@entry=4, access_size_min=<optimized -out>, access_size_max=<optimized out>, - access_fn=access_fn@entry=0xaaaabd6df284 <memory_region_write_accessor>, -mr=mr@entry=0xaaaae63898d0, attrs=attrs@entry=...) at ../softmmu/memory.c:556 -#13 0x0000aaaabd6e0dc8 in memory_region_dispatch_write -(mr=mr@entry=0xaaaae63898d0, addr=65580, data=<optimized out>, op=MO_32, -attrs=attrs@entry=...) at ../softmmu/memory.c:1534 -#14 0x0000aaaabd6d0574 in flatview_write_continue (fv=fv@entry=0xffff5c02da00, -addr=addr@entry=275146407980, attrs=attrs@entry=..., -ptr=ptr@entry=0xffff8aa8c028, len=len@entry=4, - addr1=<optimized out>, l=<optimized out>, mr=mr@entry=0xaaaae63898d0) at -/usr/src/debug/qemu-6.2.0-226.aarch64/include/qemu/host-utils.h:165 -#15 0x0000aaaabd6d4584 in flatview_write (len=4, buf=0xffff8aa8c028, attrs=..., -addr=275146407980, fv=0xffff5c02da00) at ../softmmu/physmem.c:3375 -#16 address_space_write (as=<optimized out>, addr=275146407980, attrs=..., -buf=buf@entry=0xffff8aa8c028, len=4) at ../softmmu/physmem.c:3467 -#17 0x0000aaaabd6d462c in address_space_rw (as=<optimized out>, addr=<optimized -out>, attrs=..., attrs@entry=..., buf=buf@entry=0xffff8aa8c028, len=<optimized -out>, is_write=<optimized out>) - at ../softmmu/physmem.c:3477 -#18 0x0000aaaabd7cf6e8 in kvm_cpu_exec (cpu=cpu@entry=0xaaaae625dfd0) at -../accel/kvm/kvm-all.c:2970 -#19 0x0000aaaabd7d09bc in kvm_vcpu_thread_fn (arg=arg@entry=0xaaaae625dfd0) at -../accel/kvm/kvm-accel-ops.c:49 -#20 0x0000aaaabd94ccd8 in qemu_thread_start (args=<optimized out>) at -../util/qemu-thread-posix.c:559 - - -By printing more info in the coredump file, I found that the addr of -fds_old[146] and fds_new[146] are same, but fds_old[146] belonged to a -live-attached virtio-scsi device while fds_new[146] was owned by another -live-attached virtio-net. -The reason why addr conflicted was then been found from vm's console log. Just -before qemu aborted, the guest kernel crashed and kdump.service booted the -dump-capture kernel where re-alloced address for the devices. -Because those virtio devices were live-attached after vm creating, different -addr may been assigned to them in the dump-capture kernel: - -the initial kernel booting log: -[ 1.663297] pci 0000:00:02.1: BAR 14: assigned [mem 0x11900000-0x11afffff] -[ 1.664560] pci 0000:00:02.1: BAR 15: assigned [mem -0x8001800000-0x80019fffff 64bit pref] - -the dump-capture kernel booting log: -[ 1.845211] pci 0000:00:02.0: BAR 14: assigned [mem 0x11900000-0x11bfffff] -[ 1.846542] pci 0000:00:02.0: BAR 15: assigned [mem -0x8001800000-0x8001afffff 64bit pref] - - -I think directly aborting the qemu process may not be the best choice in this -case cuz it will interrupt the work of kdump.service so that failed to generate -memory dump of the crashed guest kernel. -Perhaps, IMO, the error could be simply ignored in this case and just let kdump -to reboot the system after memory-dump finishing, but I failed to find a -suitable judgment in the codes. - -Any solution for this problem? Hope I can get some helps here. - -Hao - diff --git a/results/classifier/016/virtual/24190340 b/results/classifier/016/virtual/24190340 deleted file mode 100644 index 0fdffdcd..00000000 --- a/results/classifier/016/virtual/24190340 +++ /dev/null @@ -1,2083 +0,0 @@ -virtual: 0.947 -debug: 0.831 -x86: 0.818 -KVM: 0.485 -kernel: 0.456 -TCG: 0.263 -operating system: 0.254 -hypervisor: 0.202 -register: 0.173 -PID: 0.069 -performance: 0.059 -socket: 0.056 -risc-v: 0.029 -VMM: 0.017 -user-level: 0.016 -device: 0.015 -network: 0.012 -files: 0.011 -assembly: 0.011 -semantic: 0.008 -vnc: 0.008 -ppc: 0.005 -architecture: 0.005 -alpha: 0.003 -graphic: 0.003 -peripherals: 0.003 -permissions: 0.002 -boot: 0.002 -i386: 0.001 -arm: 0.000 -mistranslation: 0.000 - -[BUG, RFC] Block graph deadlock on job-dismiss - -Hi all, - -There's a bug in block layer which leads to block graph deadlock. -Notably, it takes place when blockdev IO is processed within a separate -iothread. - -This was initially caught by our tests, and I was able to reduce it to a -relatively simple reproducer. Such deadlocks are probably supposed to -be covered in iotests/graph-changes-while-io, but this deadlock isn't. - -Basically what the reproducer does is launches QEMU with a drive having -'iothread' option set, creates a chain of 2 snapshots, launches -block-commit job for a snapshot and then dismisses the job, starting -from the lower snapshot. If the guest is issuing IO at the same time, -there's a race in acquiring block graph lock and a potential deadlock. - -Here's how it can be reproduced: - -1. Run QEMU: -> -SRCDIR=/path/to/srcdir -> -> -> -> -> -$SRCDIR/build/qemu-system-x86_64 -enable-kvm \ -> -> --machine q35 -cpu Nehalem \ -> -> --name guest=alma8-vm,debug-threads=on \ -> -> --m 2g -smp 2 \ -> -> --nographic -nodefaults \ -> -> --qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ -> -> --serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ -> -> --object iothread,id=iothread0 \ -> -> --blockdev -> -node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 -> -\ -> --device virtio-blk-pci,drive=disk,iothread=iothread0 -2. Launch IO (random reads) from within the guest: -> -nc -U /var/run/alma8-serial.sock -> -... -> -[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k -> ---size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting -> ---rw=randread --iodepth=1 --filename=/testfile -3. Run snapshots creation & removal of lower snapshot operation in a -loop (script attached): -> -while /bin/true ; do ./remove_lower_snap.sh ; done -And then it occasionally hangs. - -Note: I've tried bisecting this, and looks like deadlock occurs starting -from the following commit: - -(BAD) 5bdbaebcce virtio: Re-enable notifications after drain -(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll - -On the latest v10.0.0 it does hang as well. - - -Here's backtrace of the main thread: - -> -#0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, -> -timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43 -> -#1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, -> -timeout=-1) at ../util/qemu-timer.c:329 -> -#2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -> -ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -> -#3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at -> -../util/aio-posix.c:730 -> -#4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, -> -parent=0x0, poll=true) at ../block/io.c:378 -> -#5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -> -../block/io.c:391 -> -#6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7682 -> -#7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7608 -> -#8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7668 -> -#9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7608 -> -#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7668 -> -#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7608 -> -#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../blockjob.c:157 -> -#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7592 -> -#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7661 -> -#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx -> -(child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = -> -{...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -> -#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7592 -> -#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7661 -> -#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, -> -ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 -> -#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at -> -../block.c:3317 -> -#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at -> -../blockjob.c:209 -> -#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -> -../blockjob.c:82 -> -#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at -> -../job.c:474 -> -#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -> -../job.c:771 -> -#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -> -errp=0x7ffd94b4f488) at ../job.c:783 -> ---Type <RET> for more, q to quit, c to continue without paging-- -> -#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", -> -errp=0x7ffd94b4f488) at ../job-qmp.c:138 -> -#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -> -ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -> -#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -> -../qapi/qmp-dispatch.c:128 -> -#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at -> -../util/async.c:172 -> -#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -> -../util/async.c:219 -> -#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -> -../util/aio-posix.c:436 -> -#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -> -callback=0x0, user_data=0x0) at ../util/async.c:361 -> -#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -> -../glib/gmain.c:3364 -> -#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 -> -#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 -> -#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -> -../util/main-loop.c:310 -> -#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -> -../util/main-loop.c:589 -> -#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -> -#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at -> -../system/main.c:50 -> -#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -> -../system/main.c:80 -And here's coroutine trying to acquire read lock: - -> -(gdb) qemu coroutine reader_queue->entries.sqh_first -> -#0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -> -to_=0x7fc537fff508, action=COROUTINE_YIELD) at -> -../util/coroutine-ucontext.c:321 -> -#1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -> -../util/qemu-coroutine.c:339 -> -#2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -> -<reader_queue>, lock=0x7fc53c57de50, flags=0) at -> -../util/qemu-coroutine-lock.c:60 -> -#3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231 -> -#4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at -> -/home/root/src/qemu/master/include/block/graph-lock.h:213 -> -#5 0x0000557eb460fa41 in blk_co_do_preadv_part -> -(blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, -> -qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339 -> -#6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -> -../block/block-backend.c:1619 -> -#7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at -> -../util/coroutine-ucontext.c:175 -> -#8 0x00007fc547c2a360 in __start_context () at -> -../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -> -#9 0x00007ffd94b4ea40 in () -> -#10 0x0000000000000000 in () -So it looks like main thread is processing job-dismiss request and is -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -above). At the same time iothread spawns a coroutine which performs IO -request. Before the coroutine is spawned, blk_aio_prwv() increases -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -trying to acquire the read lock. But main thread isn't releasing the -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -Here's the deadlock. - -Any comments and suggestions on the subject are welcomed. Thanks! - -Andrey -remove_lower_snap.sh -Description: -application/shellscript - -On 4/24/25 8:32 PM, Andrey Drobyshev wrote: -> -Hi all, -> -> -There's a bug in block layer which leads to block graph deadlock. -> -Notably, it takes place when blockdev IO is processed within a separate -> -iothread. -> -> -This was initially caught by our tests, and I was able to reduce it to a -> -relatively simple reproducer. Such deadlocks are probably supposed to -> -be covered in iotests/graph-changes-while-io, but this deadlock isn't. -> -> -Basically what the reproducer does is launches QEMU with a drive having -> -'iothread' option set, creates a chain of 2 snapshots, launches -> -block-commit job for a snapshot and then dismisses the job, starting -> -from the lower snapshot. If the guest is issuing IO at the same time, -> -there's a race in acquiring block graph lock and a potential deadlock. -> -> -Here's how it can be reproduced: -> -> -[...] -> -I took a closer look at iotests/graph-changes-while-io, and have managed -to reproduce the same deadlock in a much simpler setup, without a guest. - -1. Run QSD:> ./build/storage-daemon/qemu-storage-daemon --object -iothread,id=iothread0 \ -> ---blockdev null-co,node-name=node0,read-zeroes=true \ -> -> ---nbd-server addr.type=unix,addr.path=/var/run/qsd_nbd.sock \ -> -> ---export -> -nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true -> -\ -> ---chardev -> -socket,id=qmp-sock,path=/var/run/qsd_qmp.sock,server=on,wait=off \ -> ---monitor chardev=qmp-sock -2. Launch IO: -> -qemu-img bench -f raw -c 2000000 -> -'nbd+unix:///node0?socket=/var/run/qsd_nbd.sock' -3. Add 2 snapshots and remove lower one (script attached):> while -/bin/true ; do ./rls_qsd.sh ; done - -And then it hangs. - -I'll also send a patch with corresponding test case added directly to -iotests. - -This reproduce seems to be hanging starting from Fiona's commit -67446e605dc ("blockjob: drop AioContext lock before calling -bdrv_graph_wrlock()"). AioContext locks were dropped entirely later on -in Stefan's commit b49f4755c7 ("block: remove AioContext locking"), but -the problem remains. - -Andrey -rls_qsd.sh -Description: -application/shellscript - -From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> - -This case is catching potential deadlock which takes place when job-dismiss -is issued when I/O requests are processed in a separate iothread. - -See -https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html -Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> ---- - .../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++++++++-- - .../tests/graph-changes-while-io.out | 4 +- - 2 files changed, 96 insertions(+), 9 deletions(-) - -diff --git a/tests/qemu-iotests/tests/graph-changes-while-io -b/tests/qemu-iotests/tests/graph-changes-while-io -index 194fda500e..e30f823da4 100755 ---- a/tests/qemu-iotests/tests/graph-changes-while-io -+++ b/tests/qemu-iotests/tests/graph-changes-while-io -@@ -27,6 +27,8 @@ from iotests import imgfmt, qemu_img, qemu_img_create, -qemu_io, \ - - - top = os.path.join(iotests.test_dir, 'top.img') -+snap1 = os.path.join(iotests.test_dir, 'snap1.img') -+snap2 = os.path.join(iotests.test_dir, 'snap2.img') - nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock') - - -@@ -58,6 +60,15 @@ class TestGraphChangesWhileIO(QMPTestCase): - def tearDown(self) -> None: - self.qsd.stop() - -+ def _wait_for_blockjob(self, status) -> None: -+ done = False -+ while not done: -+ for event in self.qsd.get_qmp().get_events(wait=10.0): -+ if event['event'] != 'JOB_STATUS_CHANGE': -+ continue -+ if event['data']['status'] == status: -+ done = True -+ - def test_blockdev_add_while_io(self) -> None: - # Run qemu-img bench in the background - bench_thr = Thread(target=do_qemu_img_bench) -@@ -116,13 +127,89 @@ class TestGraphChangesWhileIO(QMPTestCase): - 'device': 'job0', - }) - -- cancelled = False -- while not cancelled: -- for event in self.qsd.get_qmp().get_events(wait=10.0): -- if event['event'] != 'JOB_STATUS_CHANGE': -- continue -- if event['data']['status'] == 'null': -- cancelled = True -+ self._wait_for_blockjob('null') -+ -+ bench_thr.join() -+ -+ def test_remove_lower_snapshot_while_io(self) -> None: -+ # Run qemu-img bench in the background -+ bench_thr = Thread(target=do_qemu_img_bench, args=(100000, )) -+ bench_thr.start() -+ -+ # While I/O is performed on 'node0' node, consequently add 2 snapshots -+ # on top of it, then remove (commit) them starting from lower one. -+ while bench_thr.is_alive(): -+ # Recreate snapshot images on every iteration -+ qemu_img_create('-f', imgfmt, snap1, '1G') -+ qemu_img_create('-f', imgfmt, snap2, '1G') -+ -+ self.qsd.cmd('blockdev-add', { -+ 'driver': imgfmt, -+ 'node-name': 'snap1', -+ 'file': { -+ 'driver': 'file', -+ 'filename': snap1 -+ } -+ }) -+ -+ self.qsd.cmd('blockdev-snapshot', { -+ 'node': 'node0', -+ 'overlay': 'snap1', -+ }) -+ -+ self.qsd.cmd('blockdev-add', { -+ 'driver': imgfmt, -+ 'node-name': 'snap2', -+ 'file': { -+ 'driver': 'file', -+ 'filename': snap2 -+ } -+ }) -+ -+ self.qsd.cmd('blockdev-snapshot', { -+ 'node': 'snap1', -+ 'overlay': 'snap2', -+ }) -+ -+ self.qsd.cmd('block-commit', { -+ 'job-id': 'commit-snap1', -+ 'device': 'snap2', -+ 'top-node': 'snap1', -+ 'base-node': 'node0', -+ 'auto-finalize': True, -+ 'auto-dismiss': False, -+ }) -+ -+ self._wait_for_blockjob('concluded') -+ self.qsd.cmd('job-dismiss', { -+ 'id': 'commit-snap1', -+ }) -+ -+ self.qsd.cmd('block-commit', { -+ 'job-id': 'commit-snap2', -+ 'device': 'snap2', -+ 'top-node': 'snap2', -+ 'base-node': 'node0', -+ 'auto-finalize': True, -+ 'auto-dismiss': False, -+ }) -+ -+ self._wait_for_blockjob('ready') -+ self.qsd.cmd('job-complete', { -+ 'id': 'commit-snap2', -+ }) -+ -+ self._wait_for_blockjob('concluded') -+ self.qsd.cmd('job-dismiss', { -+ 'id': 'commit-snap2', -+ }) -+ -+ self.qsd.cmd('blockdev-del', { -+ 'node-name': 'snap1' -+ }) -+ self.qsd.cmd('blockdev-del', { -+ 'node-name': 'snap2' -+ }) - - bench_thr.join() - -diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out -b/tests/qemu-iotests/tests/graph-changes-while-io.out -index fbc63e62f8..8d7e996700 100644 ---- a/tests/qemu-iotests/tests/graph-changes-while-io.out -+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out -@@ -1,5 +1,5 @@ --.. -+... - ---------------------------------------------------------------------- --Ran 2 tests -+Ran 3 tests - - OK --- -2.43.5 - -Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: -> -So it looks like main thread is processing job-dismiss request and is -> -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -above). At the same time iothread spawns a coroutine which performs IO -> -request. Before the coroutine is spawned, blk_aio_prwv() increases -> -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -trying to acquire the read lock. But main thread isn't releasing the -> -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -Here's the deadlock. -And for the IO test you provided, it's client->nb_requests that behaves -similarly to blk->in_flight here. - -The issue also reproduces easily when issuing the following QMP command -in a loop while doing IO on a device: - -> -void qmp_block_locked_drain(const char *node_name, Error **errp) -> -{ -> -BlockDriverState *bs; -> -> -bs = bdrv_find_node(node_name); -> -if (!bs) { -> -error_setg(errp, "node not found"); -> -return; -> -} -> -> -bdrv_graph_wrlock(); -> -bdrv_drained_begin(bs); -> -bdrv_drained_end(bs); -> -bdrv_graph_wrunlock(); -> -} -It seems like either it would be necessary to require: -1. not draining inside an exclusively locked section -or -2. making sure that variables used by drained_poll routines are only set -while holding the reader lock -? - -Those seem to require rather involved changes, so a third option might -be to make draining inside an exclusively locked section possible, by -embedding such locked sections in a drained section: - -> -diff --git a/blockjob.c b/blockjob.c -> -index 32007f31a9..9b2f3b3ea9 100644 -> ---- a/blockjob.c -> -+++ b/blockjob.c -> -@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -* one to make sure that such a concurrent access does not attempt -> -* to process an already freed BdrvChild. -> -*/ -> -+ bdrv_drain_all_begin(); -> -bdrv_graph_wrlock(); -> -while (job->nodes) { -> -GSList *l = job->nodes; -> -@@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -g_slist_free_1(l); -> -} -> -bdrv_graph_wrunlock(); -> -+ bdrv_drain_all_end(); -> -} -> -> -bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) -This seems to fix the issue at hand. I can send a patch if this is -considered an acceptable approach. - -Best Regards, -Fiona - -On 4/30/25 11:47 AM, Fiona Ebner wrote: -> -Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: -> -> So it looks like main thread is processing job-dismiss request and is -> -> holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -> above). At the same time iothread spawns a coroutine which performs IO -> -> request. Before the coroutine is spawned, blk_aio_prwv() increases -> -> 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -> trying to acquire the read lock. But main thread isn't releasing the -> -> lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -> Here's the deadlock. -> -> -And for the IO test you provided, it's client->nb_requests that behaves -> -similarly to blk->in_flight here. -> -> -The issue also reproduces easily when issuing the following QMP command -> -in a loop while doing IO on a device: -> -> -> void qmp_block_locked_drain(const char *node_name, Error **errp) -> -> { -> -> BlockDriverState *bs; -> -> -> -> bs = bdrv_find_node(node_name); -> -> if (!bs) { -> -> error_setg(errp, "node not found"); -> -> return; -> -> } -> -> -> -> bdrv_graph_wrlock(); -> -> bdrv_drained_begin(bs); -> -> bdrv_drained_end(bs); -> -> bdrv_graph_wrunlock(); -> -> } -> -> -It seems like either it would be necessary to require: -> -1. not draining inside an exclusively locked section -> -or -> -2. making sure that variables used by drained_poll routines are only set -> -while holding the reader lock -> -? -> -> -Those seem to require rather involved changes, so a third option might -> -be to make draining inside an exclusively locked section possible, by -> -embedding such locked sections in a drained section: -> -> -> diff --git a/blockjob.c b/blockjob.c -> -> index 32007f31a9..9b2f3b3ea9 100644 -> -> --- a/blockjob.c -> -> +++ b/blockjob.c -> -> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -> * one to make sure that such a concurrent access does not attempt -> -> * to process an already freed BdrvChild. -> -> */ -> -> + bdrv_drain_all_begin(); -> -> bdrv_graph_wrlock(); -> -> while (job->nodes) { -> -> GSList *l = job->nodes; -> -> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -> g_slist_free_1(l); -> -> } -> -> bdrv_graph_wrunlock(); -> -> + bdrv_drain_all_end(); -> -> } -> -> -> -> bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) -> -> -This seems to fix the issue at hand. I can send a patch if this is -> -considered an acceptable approach. -> -> -Best Regards, -> -Fiona -> -Hello Fiona, - -Thanks for looking into it. I've tried your 3rd option above and can -confirm it does fix the deadlock, at least I can't reproduce it. Other -iotests also don't seem to be breaking. So I personally am fine with -that patch. Would be nice to hear a word from the maintainers though on -whether there're any caveats with such approach. - -Andrey - -On Wed, Apr 30, 2025 at 10:11â¯AM Andrey Drobyshev -<andrey.drobyshev@virtuozzo.com> wrote: -> -> -On 4/30/25 11:47 AM, Fiona Ebner wrote: -> -> Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: -> ->> So it looks like main thread is processing job-dismiss request and is -> ->> holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> ->> above). At the same time iothread spawns a coroutine which performs IO -> ->> request. Before the coroutine is spawned, blk_aio_prwv() increases -> ->> 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> ->> trying to acquire the read lock. But main thread isn't releasing the -> ->> lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> ->> Here's the deadlock. -> -> -> -> And for the IO test you provided, it's client->nb_requests that behaves -> -> similarly to blk->in_flight here. -> -> -> -> The issue also reproduces easily when issuing the following QMP command -> -> in a loop while doing IO on a device: -> -> -> ->> void qmp_block_locked_drain(const char *node_name, Error **errp) -> ->> { -> ->> BlockDriverState *bs; -> ->> -> ->> bs = bdrv_find_node(node_name); -> ->> if (!bs) { -> ->> error_setg(errp, "node not found"); -> ->> return; -> ->> } -> ->> -> ->> bdrv_graph_wrlock(); -> ->> bdrv_drained_begin(bs); -> ->> bdrv_drained_end(bs); -> ->> bdrv_graph_wrunlock(); -> ->> } -> -> -> -> It seems like either it would be necessary to require: -> -> 1. not draining inside an exclusively locked section -> -> or -> -> 2. making sure that variables used by drained_poll routines are only set -> -> while holding the reader lock -> -> ? -> -> -> -> Those seem to require rather involved changes, so a third option might -> -> be to make draining inside an exclusively locked section possible, by -> -> embedding such locked sections in a drained section: -> -> -> ->> diff --git a/blockjob.c b/blockjob.c -> ->> index 32007f31a9..9b2f3b3ea9 100644 -> ->> --- a/blockjob.c -> ->> +++ b/blockjob.c -> ->> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> ->> * one to make sure that such a concurrent access does not attempt -> ->> * to process an already freed BdrvChild. -> ->> */ -> ->> + bdrv_drain_all_begin(); -> ->> bdrv_graph_wrlock(); -> ->> while (job->nodes) { -> ->> GSList *l = job->nodes; -> ->> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> ->> g_slist_free_1(l); -> ->> } -> ->> bdrv_graph_wrunlock(); -> ->> + bdrv_drain_all_end(); -> ->> } -> ->> -> ->> bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) -> -> -> -> This seems to fix the issue at hand. I can send a patch if this is -> -> considered an acceptable approach. -Kevin is aware of this thread but it's a public holiday tomorrow so it -may be a little longer. - -Stefan - -Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: -> -Hi all, -> -> -There's a bug in block layer which leads to block graph deadlock. -> -Notably, it takes place when blockdev IO is processed within a separate -> -iothread. -> -> -This was initially caught by our tests, and I was able to reduce it to a -> -relatively simple reproducer. Such deadlocks are probably supposed to -> -be covered in iotests/graph-changes-while-io, but this deadlock isn't. -> -> -Basically what the reproducer does is launches QEMU with a drive having -> -'iothread' option set, creates a chain of 2 snapshots, launches -> -block-commit job for a snapshot and then dismisses the job, starting -> -from the lower snapshot. If the guest is issuing IO at the same time, -> -there's a race in acquiring block graph lock and a potential deadlock. -> -> -Here's how it can be reproduced: -> -> -1. Run QEMU: -> -> SRCDIR=/path/to/srcdir -> -> -> -> -> -> -> -> -> -> $SRCDIR/build/qemu-system-x86_64 -enable-kvm \ -> -> -> -> -machine q35 -cpu Nehalem \ -> -> -> -> -name guest=alma8-vm,debug-threads=on \ -> -> -> -> -m 2g -smp 2 \ -> -> -> -> -nographic -nodefaults \ -> -> -> -> -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ -> -> -> -> -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ -> -> -> -> -object iothread,id=iothread0 \ -> -> -> -> -blockdev -> -> node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 -> -> \ -> -> -device virtio-blk-pci,drive=disk,iothread=iothread0 -> -> -2. Launch IO (random reads) from within the guest: -> -> nc -U /var/run/alma8-serial.sock -> -> ... -> -> [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k -> -> --size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting -> -> --rw=randread --iodepth=1 --filename=/testfile -> -> -3. Run snapshots creation & removal of lower snapshot operation in a -> -loop (script attached): -> -> while /bin/true ; do ./remove_lower_snap.sh ; done -> -> -And then it occasionally hangs. -> -> -Note: I've tried bisecting this, and looks like deadlock occurs starting -> -from the following commit: -> -> -(BAD) 5bdbaebcce virtio: Re-enable notifications after drain -> -(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll -> -> -On the latest v10.0.0 it does hang as well. -> -> -> -Here's backtrace of the main thread: -> -> -> #0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, -> -> timeout=<optimized out>, sigmask=0x0) at -> -> ../sysdeps/unix/sysv/linux/ppoll.c:43 -> -> #1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, -> -> timeout=-1) at ../util/qemu-timer.c:329 -> -> #2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -> -> ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -> -> #3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at -> -> ../util/aio-posix.c:730 -> -> #4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, -> -> parent=0x0, poll=true) at ../block/io.c:378 -> -> #5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -> -> ../block/io.c:391 -> -> #6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7682 -> -> #7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7608 -> -> #8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7668 -> -> #9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7608 -> -> #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7668 -> -> #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7608 -> -> #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../blockjob.c:157 -> -> #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7592 -> -> #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7661 -> -> #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx -> -> (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = -> -> {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -> -> #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7592 -> -> #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7661 -> -> #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, -> -> ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 -> -> #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at -> -> ../block.c:3317 -> -> #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at -> -> ../blockjob.c:209 -> -> #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -> -> ../blockjob.c:82 -> -> #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at -> -> ../job.c:474 -> -> #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -> -> ../job.c:771 -> -> #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -> -> errp=0x7ffd94b4f488) at ../job.c:783 -> -> --Type <RET> for more, q to quit, c to continue without paging-- -> -> #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 -> -> "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138 -> -> #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -> -> ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -> -> #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -> -> ../qapi/qmp-dispatch.c:128 -> -> #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at -> -> ../util/async.c:172 -> -> #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -> -> ../util/async.c:219 -> -> #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -> -> ../util/aio-posix.c:436 -> -> #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -> -> callback=0x0, user_data=0x0) at ../util/async.c:361 -> -> #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -> -> ../glib/gmain.c:3364 -> -> #33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 -> -> #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 -> -> #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -> -> ../util/main-loop.c:310 -> -> #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -> -> ../util/main-loop.c:589 -> -> #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -> -> #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at -> -> ../system/main.c:50 -> -> #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -> -> ../system/main.c:80 -> -> -> -And here's coroutine trying to acquire read lock: -> -> -> (gdb) qemu coroutine reader_queue->entries.sqh_first -> -> #0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -> -> to_=0x7fc537fff508, action=COROUTINE_YIELD) at -> -> ../util/coroutine-ucontext.c:321 -> -> #1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -> -> ../util/qemu-coroutine.c:339 -> -> #2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -> -> <reader_queue>, lock=0x7fc53c57de50, flags=0) at -> -> ../util/qemu-coroutine-lock.c:60 -> -> #3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at -> -> ../block/graph-lock.c:231 -> -> #4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at -> -> /home/root/src/qemu/master/include/block/graph-lock.h:213 -> -> #5 0x0000557eb460fa41 in blk_co_do_preadv_part -> -> (blk=0x557eb84c0810, offset=6890553344, bytes=4096, -> -> qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at -> -> ../block/block-backend.c:1339 -> -> #6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -> -> ../block/block-backend.c:1619 -> -> #7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) -> -> at ../util/coroutine-ucontext.c:175 -> -> #8 0x00007fc547c2a360 in __start_context () at -> -> ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -> -> #9 0x00007ffd94b4ea40 in () -> -> #10 0x0000000000000000 in () -> -> -> -So it looks like main thread is processing job-dismiss request and is -> -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -above). At the same time iothread spawns a coroutine which performs IO -> -request. Before the coroutine is spawned, blk_aio_prwv() increases -> -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -trying to acquire the read lock. But main thread isn't releasing the -> -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -Here's the deadlock. -> -> -Any comments and suggestions on the subject are welcomed. Thanks! -I think this is what the blk_wait_while_drained() call was supposed to -address in blk_co_do_preadv_part(). However, with the use of multiple -I/O threads, this is racy. - -Do you think that in your case we hit the small race window between the -checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there -another reason why blk_wait_while_drained() didn't do its job? - -Kevin - -On 5/2/25 19:34, Kevin Wolf wrote: -Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: -Hi all, - -There's a bug in block layer which leads to block graph deadlock. -Notably, it takes place when blockdev IO is processed within a separate -iothread. - -This was initially caught by our tests, and I was able to reduce it to a -relatively simple reproducer. Such deadlocks are probably supposed to -be covered in iotests/graph-changes-while-io, but this deadlock isn't. - -Basically what the reproducer does is launches QEMU with a drive having -'iothread' option set, creates a chain of 2 snapshots, launches -block-commit job for a snapshot and then dismisses the job, starting -from the lower snapshot. If the guest is issuing IO at the same time, -there's a race in acquiring block graph lock and a potential deadlock. - -Here's how it can be reproduced: - -1. Run QEMU: -SRCDIR=/path/to/srcdir -$SRCDIR/build/qemu-system-x86_64 -enable-kvm \ --machine q35 -cpu Nehalem \ - -name guest=alma8-vm,debug-threads=on \ - -m 2g -smp 2 \ - -nographic -nodefaults \ - -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ - -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ - -object iothread,id=iothread0 \ - -blockdev -node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 - \ - -device virtio-blk-pci,drive=disk,iothread=iothread0 -2. Launch IO (random reads) from within the guest: -nc -U /var/run/alma8-serial.sock -... -[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k ---size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting ---rw=randread --iodepth=1 --filename=/testfile -3. Run snapshots creation & removal of lower snapshot operation in a -loop (script attached): -while /bin/true ; do ./remove_lower_snap.sh ; done -And then it occasionally hangs. - -Note: I've tried bisecting this, and looks like deadlock occurs starting -from the following commit: - -(BAD) 5bdbaebcce virtio: Re-enable notifications after drain -(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll - -On the latest v10.0.0 it does hang as well. - - -Here's backtrace of the main thread: -#0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, timeout=<optimized -out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43 -#1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, timeout=-1) -at ../util/qemu-timer.c:329 -#2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -#3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at -../util/aio-posix.c:730 -#4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, parent=0x0, -poll=true) at ../block/io.c:378 -#5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -../block/io.c:391 -#6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7682 -#7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7608 -#8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7668 -#9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7608 -#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7668 -#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7608 -#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../blockjob.c:157 -#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7592 -#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7661 -#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx - (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7592 -#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7661 -#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, -ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 -#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at -../block.c:3317 -#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at -../blockjob.c:209 -#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -../blockjob.c:82 -#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at ../job.c:474 -#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -../job.c:771 -#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -errp=0x7ffd94b4f488) at ../job.c:783 ---Type <RET> for more, q to quit, c to continue without paging-- -#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", -errp=0x7ffd94b4f488) at ../job-qmp.c:138 -#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -../qapi/qmp-dispatch.c:128 -#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at ../util/async.c:172 -#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -../util/async.c:219 -#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -../util/aio-posix.c:436 -#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -callback=0x0, user_data=0x0) at ../util/async.c:361 -#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -../glib/gmain.c:3364 -#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 -#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 -#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -../util/main-loop.c:310 -#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -../util/main-loop.c:589 -#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at ../system/main.c:50 -#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -../system/main.c:80 -And here's coroutine trying to acquire read lock: -(gdb) qemu coroutine reader_queue->entries.sqh_first -#0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -to_=0x7fc537fff508, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:321 -#1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -../util/qemu-coroutine.c:339 -#2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -<reader_queue>, lock=0x7fc53c57de50, flags=0) at -../util/qemu-coroutine-lock.c:60 -#3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231 -#4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at -/home/root/src/qemu/master/include/block/graph-lock.h:213 -#5 0x0000557eb460fa41 in blk_co_do_preadv_part - (blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, -qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339 -#6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -../block/block-backend.c:1619 -#7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at -../util/coroutine-ucontext.c:175 -#8 0x00007fc547c2a360 in __start_context () at -../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -#9 0x00007ffd94b4ea40 in () -#10 0x0000000000000000 in () -So it looks like main thread is processing job-dismiss request and is -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -above). At the same time iothread spawns a coroutine which performs IO -request. Before the coroutine is spawned, blk_aio_prwv() increases -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -trying to acquire the read lock. But main thread isn't releasing the -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -Here's the deadlock. - -Any comments and suggestions on the subject are welcomed. Thanks! -I think this is what the blk_wait_while_drained() call was supposed to -address in blk_co_do_preadv_part(). However, with the use of multiple -I/O threads, this is racy. - -Do you think that in your case we hit the small race window between the -checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there -another reason why blk_wait_while_drained() didn't do its job? - -Kevin -At my opinion there is very big race window. Main thread has -eaten graph write lock. After that another coroutine is stalled -within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only -after that main thread has started drain. That is why Fiona's idea is -looking working. Though this would mean that normally we should always -do that at the moment when we acquire write lock. May be even inside -this function. Den - -Am 02.05.2025 um 19:52 hat Denis V. Lunev geschrieben: -> -On 5/2/25 19:34, Kevin Wolf wrote: -> -> Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: -> -> > Hi all, -> -> > -> -> > There's a bug in block layer which leads to block graph deadlock. -> -> > Notably, it takes place when blockdev IO is processed within a separate -> -> > iothread. -> -> > -> -> > This was initially caught by our tests, and I was able to reduce it to a -> -> > relatively simple reproducer. Such deadlocks are probably supposed to -> -> > be covered in iotests/graph-changes-while-io, but this deadlock isn't. -> -> > -> -> > Basically what the reproducer does is launches QEMU with a drive having -> -> > 'iothread' option set, creates a chain of 2 snapshots, launches -> -> > block-commit job for a snapshot and then dismisses the job, starting -> -> > from the lower snapshot. If the guest is issuing IO at the same time, -> -> > there's a race in acquiring block graph lock and a potential deadlock. -> -> > -> -> > Here's how it can be reproduced: -> -> > -> -> > 1. Run QEMU: -> -> > > SRCDIR=/path/to/srcdir -> -> > > $SRCDIR/build/qemu-system-x86_64 -enable-kvm \ -> -> > > -machine q35 -cpu Nehalem \ -> -> > > -name guest=alma8-vm,debug-threads=on \ -> -> > > -m 2g -smp 2 \ -> -> > > -nographic -nodefaults \ -> -> > > -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ -> -> > > -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ -> -> > > -object iothread,id=iothread0 \ -> -> > > -blockdev -> -> > > node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 -> -> > > \ -> -> > > -device virtio-blk-pci,drive=disk,iothread=iothread0 -> -> > 2. Launch IO (random reads) from within the guest: -> -> > > nc -U /var/run/alma8-serial.sock -> -> > > ... -> -> > > [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 -> -> > > --bs=4k --size=1G --numjobs=1 --time_based=1 --runtime=300 -> -> > > --group_reporting --rw=randread --iodepth=1 --filename=/testfile -> -> > 3. Run snapshots creation & removal of lower snapshot operation in a -> -> > loop (script attached): -> -> > > while /bin/true ; do ./remove_lower_snap.sh ; done -> -> > And then it occasionally hangs. -> -> > -> -> > Note: I've tried bisecting this, and looks like deadlock occurs starting -> -> > from the following commit: -> -> > -> -> > (BAD) 5bdbaebcce virtio: Re-enable notifications after drain -> -> > (GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll -> -> > -> -> > On the latest v10.0.0 it does hang as well. -> -> > -> -> > -> -> > Here's backtrace of the main thread: -> -> > -> -> > > #0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, -> -> > > timeout=<optimized out>, sigmask=0x0) at -> -> > > ../sysdeps/unix/sysv/linux/ppoll.c:43 -> -> > > #1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, -> -> > > timeout=-1) at ../util/qemu-timer.c:329 -> -> > > #2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -> -> > > ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -> -> > > #3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) -> -> > > at ../util/aio-posix.c:730 -> -> > > #4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, -> -> > > parent=0x0, poll=true) at ../block/io.c:378 -> -> > > #5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -> -> > > ../block/io.c:391 -> -> > > #6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7682 -> -> > > #7 0x0000557eb45ebf2b in bdrv_child_change_aio_context -> -> > > (c=0x557eb7964250, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7608 -> -> > > #8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7668 -> -> > > #9 0x0000557eb45ebf2b in bdrv_child_change_aio_context -> -> > > (c=0x557eb7e59110, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7608 -> -> > > #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7668 -> -> > > #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context -> -> > > (c=0x557eb814ed80, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7608 -> -> > > #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../blockjob.c:157 -> -> > > #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context -> -> > > (c=0x557eb7c9d3f0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7592 -> -> > > #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7661 -> -> > > #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx -> -> > > (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 -> -> > > = {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -> -> > > #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context -> -> > > (c=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7592 -> -> > > #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7661 -> -> > > #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context -> -> > > (bs=0x557eb79575e0, ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at -> -> > > ../block.c:7715 -> -> > > #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) -> -> > > at ../block.c:3317 -> -> > > #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv -> -> > > (job=0x557eb7952800) at ../blockjob.c:209 -> -> > > #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -> -> > > ../blockjob.c:82 -> -> > > #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at -> -> > > ../job.c:474 -> -> > > #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -> -> > > ../job.c:771 -> -> > > #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -> -> > > errp=0x7ffd94b4f488) at ../job.c:783 -> -> > > --Type <RET> for more, q to quit, c to continue without paging-- -> -> > > #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 -> -> > > "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138 -> -> > > #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -> -> > > ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -> -> > > #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -> -> > > ../qapi/qmp-dispatch.c:128 -> -> > > #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at -> -> > > ../util/async.c:172 -> -> > > #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -> -> > > ../util/async.c:219 -> -> > > #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -> -> > > ../util/aio-posix.c:436 -> -> > > #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -> -> > > callback=0x0, user_data=0x0) at ../util/async.c:361 -> -> > > #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -> -> > > ../glib/gmain.c:3364 -> -> > > #33 g_main_context_dispatch (context=0x557eb76c6430) at -> -> > > ../glib/gmain.c:4079 -> -> > > #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at -> -> > > ../util/main-loop.c:287 -> -> > > #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -> -> > > ../util/main-loop.c:310 -> -> > > #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -> -> > > ../util/main-loop.c:589 -> -> > > #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -> -> > > #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at -> -> > > ../system/main.c:50 -> -> > > #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -> -> > > ../system/main.c:80 -> -> > -> -> > And here's coroutine trying to acquire read lock: -> -> > -> -> > > (gdb) qemu coroutine reader_queue->entries.sqh_first -> -> > > #0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -> -> > > to_=0x7fc537fff508, action=COROUTINE_YIELD) at -> -> > > ../util/coroutine-ucontext.c:321 -> -> > > #1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -> -> > > ../util/qemu-coroutine.c:339 -> -> > > #2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -> -> > > <reader_queue>, lock=0x7fc53c57de50, flags=0) at -> -> > > ../util/qemu-coroutine-lock.c:60 -> -> > > #3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at -> -> > > ../block/graph-lock.c:231 -> -> > > #4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) -> -> > > at /home/root/src/qemu/master/include/block/graph-lock.h:213 -> -> > > #5 0x0000557eb460fa41 in blk_co_do_preadv_part -> -> > > (blk=0x557eb84c0810, offset=6890553344, bytes=4096, -> -> > > qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at -> -> > > ../block/block-backend.c:1339 -> -> > > #6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -> -> > > ../block/block-backend.c:1619 -> -> > > #7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, -> -> > > i1=21886) at ../util/coroutine-ucontext.c:175 -> -> > > #8 0x00007fc547c2a360 in __start_context () at -> -> > > ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -> -> > > #9 0x00007ffd94b4ea40 in () -> -> > > #10 0x0000000000000000 in () -> -> > -> -> > So it looks like main thread is processing job-dismiss request and is -> -> > holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -> > above). At the same time iothread spawns a coroutine which performs IO -> -> > request. Before the coroutine is spawned, blk_aio_prwv() increases -> -> > 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -> > trying to acquire the read lock. But main thread isn't releasing the -> -> > lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -> > Here's the deadlock. -> -> > -> -> > Any comments and suggestions on the subject are welcomed. Thanks! -> -> I think this is what the blk_wait_while_drained() call was supposed to -> -> address in blk_co_do_preadv_part(). However, with the use of multiple -> -> I/O threads, this is racy. -> -> -> -> Do you think that in your case we hit the small race window between the -> -> checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there -> -> another reason why blk_wait_while_drained() didn't do its job? -> -> -> -At my opinion there is very big race window. Main thread has -> -eaten graph write lock. After that another coroutine is stalled -> -within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only -> -after that main thread has started drain. -You're right, I confused taking the write lock with draining there. - -> -That is why Fiona's idea is looking working. Though this would mean -> -that normally we should always do that at the moment when we acquire -> -write lock. May be even inside this function. -I actually see now that not all of my graph locking patches were merged. -At least I did have the thought that bdrv_drained_begin() must be marked -GRAPH_UNLOCKED because it polls. That means that calling it from inside -bdrv_try_change_aio_context() is actually forbidden (and that's the part -I didn't see back then because it doesn't have TSA annotations). - -If you refactor the code to move the drain out to before the lock is -taken, I think you end up with Fiona's patch, except you'll remove the -forbidden inner drain and add more annotations for some functions and -clarify the rules around them. I don't know, but I wouldn't be surprised -if along the process we find other bugs, too. - -So Fiona's drain looks right to me, but we should probably approach it -more systematically. - -Kevin - diff --git a/results/classifier/016/virtual/24930826 b/results/classifier/016/virtual/24930826 deleted file mode 100644 index 23479fd3..00000000 --- a/results/classifier/016/virtual/24930826 +++ /dev/null @@ -1,60 +0,0 @@ -virtual: 0.989 -hypervisor: 0.884 -debug: 0.787 -user-level: 0.571 -device: 0.253 -operating system: 0.212 -x86: 0.071 -TCG: 0.045 -network: 0.044 -files: 0.037 -peripherals: 0.036 -register: 0.020 -KVM: 0.018 -PID: 0.016 -socket: 0.012 -i386: 0.007 -VMM: 0.006 -kernel: 0.004 -semantic: 0.004 -performance: 0.003 -architecture: 0.003 -assembly: 0.002 -alpha: 0.002 -permissions: 0.002 -vnc: 0.002 -boot: 0.001 -graphic: 0.001 -risc-v: 0.001 -ppc: 0.001 -arm: 0.001 -mistranslation: 0.000 - -[Qemu-devel] [BUG] vhost-user: hot-unplug vhost-user nic for windows guest OS will fail with 100% reproduce rate - -Hi, guys - -I met a problem when hot-unplug vhost-user nic for Windows 2008 rc2 sp1 64 -(Guest OS) - -The xml of nic is as followed: -<interface type='vhostuser'> - <mac address='52:54:00:3b:83:aa'/> - <source type='unix' path='/var/run/vhost-user/port1' mode='client'/> - <target dev='port1'/> - <model type='virtio'/> - <driver queues='4'/> - <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> -</interface> - -Firstly, I use virsh attach-device win2008 vif.xml to hot-plug a nic for Guest -OS. This operation returns success. -After guest OS discover nic successfully, I use virsh detach-device win2008 -vif.xml to hot-unplug it. This operation will fail with 100% reproduce rate. - -However, if I hot-plug and hot-unplug virtio-net nic , it will not fail. - -I have analysis the process of qmp_device_del , I found that qemu have inject -interrupt to acpi to let it notice guest OS to remove nic. -I guess there is something wrong in Windows when handle the interrupt. - diff --git a/results/classifier/016/virtual/25892827 b/results/classifier/016/virtual/25892827 deleted file mode 100644 index 79d1a51a..00000000 --- a/results/classifier/016/virtual/25892827 +++ /dev/null @@ -1,1104 +0,0 @@ -virtual: 0.955 -KVM: 0.881 -x86: 0.715 -debug: 0.617 -hypervisor: 0.546 -PID: 0.132 -register: 0.102 -performance: 0.058 -files: 0.056 -operating system: 0.051 -kernel: 0.049 -boot: 0.025 -assembly: 0.017 -device: 0.014 -socket: 0.013 -semantic: 0.009 -user-level: 0.008 -TCG: 0.008 -risc-v: 0.007 -architecture: 0.006 -ppc: 0.005 -VMM: 0.005 -permissions: 0.004 -vnc: 0.004 -network: 0.004 -peripherals: 0.003 -arm: 0.003 -i386: 0.003 -alpha: 0.002 -graphic: 0.002 -mistranslation: 0.000 - -[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot - -Hi, - -Recently we encountered a problem in our project: 2 CPUs in VM are not brought -up normally after reboot. - -Our host is using KVM kmod 3.6 and QEMU 2.1. -A SLES 11 sp3 VM configured with 8 vcpus, -cpu model is configured with 'host-passthrough'. - -After VM's first time started up, everything seems to be OK. -and then VM is paniced and rebooted. -After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online. - -This is the only message we can get from VM: -VM dmesg shows: -[ 0.069867] Booting Node 0, Processors #1 -[ 5.060042] CPU1: Stuck ?? -[ 5.060499] #2 -[ 5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock -[ 5.088335] KVM setup async PF for cpu 2 -[ 5.092967] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.094405] #3 -[ 5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock -[ 5.108333] KVM setup async PF for cpu 3 -[ 5.113553] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.114970] #4 -[ 5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock -[ 5.128336] KVM setup async PF for cpu 4 -[ 5.134576] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.135998] #5 -[ 5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock -[ 5.152334] KVM setup async PF for cpu 5 -[ 5.154764] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.156467] #6 -[ 5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock -[ 5.172341] KVM setup async PF for cpu 6 -[ 5.180738] NMI watchdog enabled, takes one hw-pmu counter. -[ 5.182173] #7 Ok. -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs -[ 10.172394] Total of 6 processors activated (28799.97 BogoMIPS). - -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming -any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' warning, -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? - -Has anyone come across these problem before? Or any idea? - -Thanks, -zhanghailiang - -On 06/07/2015 09:54, zhanghailiang wrote: -> -> -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -> -consuming any cpu (Should be in idle state), -> -All of VCPUs' stacks in host is like bellow: -> -> -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -> -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -> -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -> -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -> -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -> -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -> -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -> -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -> -[<ffffffffffffffff>] 0xffffffffffffffff -> -> -We looked into the kernel codes that could leading to the above 'Stuck' -> -warning, -> -and found that the only possible is the emulation of 'cpuid' instruct in -> -kvm/qemu has something wrong. -> -But since we canât reproduce this problem, we are not quite sure. -> -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? - -Paolo - -On 2015/7/6 16:45, Paolo Bonzini wrote: -On 06/07/2015 09:54, zhanghailiang wrote: -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -consuming any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' -warning, -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -located in -do_boot_cpu(). It's in BSP context, the call process is: -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() --> wakeup_secondary_via_INIT() to trigger APs. -It will wait 5s for APs to startup, if some AP not startup normally, it will -print 'CPU%d Stuck' or 'CPU%d: Not responding'. - -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -begins to execute the code -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before -smp_callin()(smpboot.c). -The follow is the starup process of BSP and AP. -BSP: -start_kernel() - ->smp_init() - ->smp_boot_cpus() - ->do_boot_cpu() - ->start_ip = trampoline_address(); //set the address that AP will go -to execute - ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU - ->for (timeout = 0; timeout < 50000; timeout++) - if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP -startup or not - -APs: -ENTRY(trampoline_data) (trampoline_64.S) - ->ENTRY(secondary_startup_64) (head_64.S) - ->start_secondary() (smpboot.c) - ->cpu_init(); - ->smp_callin(); - ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes -here, the BSP will not prints the error message. - -From above call process, we can be sure that, the AP has been stuck between -trampoline_data and the cpumask_set_cpu() in -smp_callin(), we look through these codes path carefully, and only found a -'hlt' instruct that could block the process. -It is located in trampoline_data(): - -ENTRY(trampoline_data) - ... - - call verify_cpu # Verify the cpu supports long mode - testl %eax, %eax # Check for return code - jnz no_longmode - - ... - -no_longmode: - hlt - jmp no_longmode - -For the process verify_cpu(), -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -No-root mode. -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -the fail in verify_cpu. - -From the message in VM, we know vcpu1 and vcpu7 is something wrong. -[ 5.060042] CPU1: Stuck ?? -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs - -Besides, the follow is the cpus message got from host. -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command -instance-0000000 -* CPU #0: pc=0x00007f64160c683d thread_id=68570 - CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 - CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 - CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 - CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 - CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 - CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 - CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 - -Oh, i also forgot to mention in the above message that, we have bond each vCPU -to different physical CPU in -host. - -Thanks, -zhanghailiang - -On 06/07/2015 11:59, zhanghailiang wrote: -> -> -> -Besides, the follow is the cpus message got from host. -> -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh -> -qemu-monitor-command instance-0000000 -> -* CPU #0: pc=0x00007f64160c683d thread_id=68570 -> -CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 -> -CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 -> -CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 -> -CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 -> -CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 -> -CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 -> -CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 -> -> -Oh, i also forgot to mention in the above message that, we have bond -> -each vCPU to different physical CPU in -> -host. -Can you capture a trace on the host (trace-cmd record -e kvm) and send -it privately? Please note which CPUs get stuck, since I guess it's not -always 1 and 7. - -Paolo - -On Mon, 6 Jul 2015 17:59:10 +0800 -zhanghailiang <address@hidden> wrote: - -> -On 2015/7/6 16:45, Paolo Bonzini wrote: -> -> -> -> -> -> On 06/07/2015 09:54, zhanghailiang wrote: -> ->> -> ->> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -> ->> consuming any cpu (Should be in idle state), -> ->> All of VCPUs' stacks in host is like bellow: -> ->> -> ->> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -> ->> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -> ->> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -> ->> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -> ->> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -> ->> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -> ->> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b -> ->> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -> ->> [<ffffffffffffffff>] 0xffffffffffffffff -> ->> -> ->> We looked into the kernel codes that could leading to the above 'Stuck' -> ->> warning, -in current upstream there isn't any printk(...Stuck...) left since that code -path -has been reworked. -I've often seen this on over-committed host during guest CPUs up/down torture -test. -Could you update guest kernel to upstream and see if issue reproduces? - -> ->> and found that the only possible is the emulation of 'cpuid' instruct in -> ->> kvm/qemu has something wrong. -> ->> But since we canât reproduce this problem, we are not quite sure. -> ->> Is there any possible that the cupid emulation in kvm/qemu has some bug ? -> -> -> -> Can you explain the relationship to the cpuid emulation? What do the -> -> traces say about vcpus 1 and 7? -> -> -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -> -located in -> -do_boot_cpu(). It's in BSP context, the call process is: -> -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() -> --> wakeup_secondary_via_INIT() to trigger APs. -> -It will wait 5s for APs to startup, if some AP not startup normally, it will -> -print 'CPU%d Stuck' or 'CPU%d: Not responding'. -> -> -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -> -begins to execute the code -> -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places -> -before smp_callin()(smpboot.c). -> -The follow is the starup process of BSP and AP. -> -BSP: -> -start_kernel() -> -->smp_init() -> -->smp_boot_cpus() -> -->do_boot_cpu() -> -->start_ip = trampoline_address(); //set the address that AP will -> -go to execute -> -->wakeup_secondary_cpu_via_init(); // kick the secondary CPU -> -->for (timeout = 0; timeout < 50000; timeout++) -> -if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if -> -AP startup or not -> -> -APs: -> -ENTRY(trampoline_data) (trampoline_64.S) -> -->ENTRY(secondary_startup_64) (head_64.S) -> -->start_secondary() (smpboot.c) -> -->cpu_init(); -> -->smp_callin(); -> -->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -> -comes here, the BSP will not prints the error message. -> -> -From above call process, we can be sure that, the AP has been stuck between -> -trampoline_data and the cpumask_set_cpu() in -> -smp_callin(), we look through these codes path carefully, and only found a -> -'hlt' instruct that could block the process. -> -It is located in trampoline_data(): -> -> -ENTRY(trampoline_data) -> -... -> -> -call verify_cpu # Verify the cpu supports long mode -> -testl %eax, %eax # Check for return code -> -jnz no_longmode -> -> -... -> -> -no_longmode: -> -hlt -> -jmp no_longmode -> -> -For the process verify_cpu(), -> -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -> -No-root mode. -> -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -> -the fail in verify_cpu. -> -> -From the message in VM, we know vcpu1 and vcpu7 is something wrong. -> -[ 5.060042] CPU1: Stuck ?? -> -[ 10.170815] CPU7: Stuck ?? -> -[ 10.171648] Brought up 6 CPUs -> -> -Besides, the follow is the cpus message got from host. -> -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh -> -qemu-monitor-command instance-0000000 -> -* CPU #0: pc=0x00007f64160c683d thread_id=68570 -> -CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 -> -CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 -> -CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 -> -CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 -> -CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 -> -CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 -> -CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 -> -> -Oh, i also forgot to mention in the above message that, we have bond each -> -vCPU to different physical CPU in -> -host. -> -> -Thanks, -> -zhanghailiang -> -> -> -> -> --- -> -To unsubscribe from this list: send the line "unsubscribe kvm" in -> -the body of a message to address@hidden -> -More majordomo info at -http://vger.kernel.org/majordomo-info.html - -On 2015/7/7 19:23, Igor Mammedov wrote: -On Mon, 6 Jul 2015 17:59:10 +0800 -zhanghailiang <address@hidden> wrote: -On 2015/7/6 16:45, Paolo Bonzini wrote: -On 06/07/2015 09:54, zhanghailiang wrote: -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -consuming any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' -warning, -in current upstream there isn't any printk(...Stuck...) left since that code -path -has been reworked. -I've often seen this on over-committed host during guest CPUs up/down torture -test. -Could you update guest kernel to upstream and see if issue reproduces? -Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to -reproduce it. - -For your test case, is it a kernel bug? -Or is there any related patch could solve your test problem been merged into -upstream ? - -Thanks, -zhanghailiang -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -located in -do_boot_cpu(). It's in BSP context, the call process is: -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() --> wakeup_secondary_via_INIT() to trigger APs. -It will wait 5s for APs to startup, if some AP not startup normally, it will -print 'CPU%d Stuck' or 'CPU%d: Not responding'. - -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -begins to execute the code -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before -smp_callin()(smpboot.c). -The follow is the starup process of BSP and AP. -BSP: -start_kernel() - ->smp_init() - ->smp_boot_cpus() - ->do_boot_cpu() - ->start_ip = trampoline_address(); //set the address that AP will -go to execute - ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU - ->for (timeout = 0; timeout < 50000; timeout++) - if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if -AP startup or not - -APs: -ENTRY(trampoline_data) (trampoline_64.S) - ->ENTRY(secondary_startup_64) (head_64.S) - ->start_secondary() (smpboot.c) - ->cpu_init(); - ->smp_callin(); - ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -comes here, the BSP will not prints the error message. - - From above call process, we can be sure that, the AP has been stuck between -trampoline_data and the cpumask_set_cpu() in -smp_callin(), we look through these codes path carefully, and only found a -'hlt' instruct that could block the process. -It is located in trampoline_data(): - -ENTRY(trampoline_data) - ... - - call verify_cpu # Verify the cpu supports long mode - testl %eax, %eax # Check for return code - jnz no_longmode - - ... - -no_longmode: - hlt - jmp no_longmode - -For the process verify_cpu(), -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -No-root mode. -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -the fail in verify_cpu. - - From the message in VM, we know vcpu1 and vcpu7 is something wrong. -[ 5.060042] CPU1: Stuck ?? -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs - -Besides, the follow is the cpus message got from host. -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command -instance-0000000 -* CPU #0: pc=0x00007f64160c683d thread_id=68570 - CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 - CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 - CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 - CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 - CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 - CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 - CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 - -Oh, i also forgot to mention in the above message that, we have bond each vCPU -to different physical CPU in -host. - -Thanks, -zhanghailiang - - - - --- -To unsubscribe from this list: send the line "unsubscribe kvm" in -the body of a message to address@hidden -More majordomo info at -http://vger.kernel.org/majordomo-info.html -. - -On Tue, 7 Jul 2015 19:43:35 +0800 -zhanghailiang <address@hidden> wrote: - -> -On 2015/7/7 19:23, Igor Mammedov wrote: -> -> On Mon, 6 Jul 2015 17:59:10 +0800 -> -> zhanghailiang <address@hidden> wrote: -> -> -> ->> On 2015/7/6 16:45, Paolo Bonzini wrote: -> ->>> -> ->>> -> ->>> On 06/07/2015 09:54, zhanghailiang wrote: -> ->>>> -> ->>>> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -> ->>>> consuming any cpu (Should be in idle state), -> ->>>> All of VCPUs' stacks in host is like bellow: -> ->>>> -> ->>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -> ->>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -> ->>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -> ->>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -> ->>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -> ->>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -> ->>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b -> ->>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -> ->>>> [<ffffffffffffffff>] 0xffffffffffffffff -> ->>>> -> ->>>> We looked into the kernel codes that could leading to the above 'Stuck' -> ->>>> warning, -> -> in current upstream there isn't any printk(...Stuck...) left since that -> -> code path -> -> has been reworked. -> -> I've often seen this on over-committed host during guest CPUs up/down -> -> torture test. -> -> Could you update guest kernel to upstream and see if issue reproduces? -> -> -> -> -Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to -> -reproduce it. -> -> -For your test case, is it a kernel bug? -> -Or is there any related patch could solve your test problem been merged into -> -upstream ? -I don't remember all prerequisite patches but you should be able to find -http://marc.info/?l=linux-kernel&m=140326703108009&w=2 -"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" -and then look for dependencies. - - -> -> -Thanks, -> -zhanghailiang -> -> ->>>> and found that the only possible is the emulation of 'cpuid' instruct in -> ->>>> kvm/qemu has something wrong. -> ->>>> But since we canât reproduce this problem, we are not quite sure. -> ->>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ? -> ->>> -> ->>> Can you explain the relationship to the cpuid emulation? What do the -> ->>> traces say about vcpus 1 and 7? -> ->> -> ->> OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -> ->> located in -> ->> do_boot_cpu(). It's in BSP context, the call process is: -> ->> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> -> ->> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs. -> ->> It will wait 5s for APs to startup, if some AP not startup normally, it -> ->> will print 'CPU%d Stuck' or 'CPU%d: Not responding'. -> ->> -> ->> If it prints 'Stuck', it means the AP has received the SIPI interrupt and -> ->> begins to execute the code -> ->> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places -> ->> before smp_callin()(smpboot.c). -> ->> The follow is the starup process of BSP and AP. -> ->> BSP: -> ->> start_kernel() -> ->> ->smp_init() -> ->> ->smp_boot_cpus() -> ->> ->do_boot_cpu() -> ->> ->start_ip = trampoline_address(); //set the address that AP -> ->> will go to execute -> ->> ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU -> ->> ->for (timeout = 0; timeout < 50000; timeout++) -> ->> if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// -> ->> check if AP startup or not -> ->> -> ->> APs: -> ->> ENTRY(trampoline_data) (trampoline_64.S) -> ->> ->ENTRY(secondary_startup_64) (head_64.S) -> ->> ->start_secondary() (smpboot.c) -> ->> ->cpu_init(); -> ->> ->smp_callin(); -> ->> ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -> ->> comes here, the BSP will not prints the error message. -> ->> -> ->> From above call process, we can be sure that, the AP has been stuck -> ->> between trampoline_data and the cpumask_set_cpu() in -> ->> smp_callin(), we look through these codes path carefully, and only found a -> ->> 'hlt' instruct that could block the process. -> ->> It is located in trampoline_data(): -> ->> -> ->> ENTRY(trampoline_data) -> ->> ... -> ->> -> ->> call verify_cpu # Verify the cpu supports long mode -> ->> testl %eax, %eax # Check for return code -> ->> jnz no_longmode -> ->> -> ->> ... -> ->> -> ->> no_longmode: -> ->> hlt -> ->> jmp no_longmode -> ->> -> ->> For the process verify_cpu(), -> ->> we can only find the 'cpuid' sensitive instruct that could lead VM exit -> ->> from No-root mode. -> ->> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading -> ->> to the fail in verify_cpu. -> ->> -> ->> From the message in VM, we know vcpu1 and vcpu7 is something wrong. -> ->> [ 5.060042] CPU1: Stuck ?? -> ->> [ 10.170815] CPU7: Stuck ?? -> ->> [ 10.171648] Brought up 6 CPUs -> ->> -> ->> Besides, the follow is the cpus message got from host. -> ->> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh -> ->> qemu-monitor-command instance-0000000 -> ->> * CPU #0: pc=0x00007f64160c683d thread_id=68570 -> ->> CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 -> ->> CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 -> ->> CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 -> ->> CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 -> ->> CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 -> ->> CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 -> ->> CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 -> ->> -> ->> Oh, i also forgot to mention in the above message that, we have bond each -> ->> vCPU to different physical CPU in -> ->> host. -> ->> -> ->> Thanks, -> ->> zhanghailiang -> ->> -> ->> -> ->> -> ->> -> ->> -- -> ->> To unsubscribe from this list: send the line "unsubscribe kvm" in -> ->> the body of a message to address@hidden -> ->> More majordomo info at -http://vger.kernel.org/majordomo-info.html -> -> -> -> -> -> . -> -> -> -> -> - -On 2015/7/7 20:21, Igor Mammedov wrote: -On Tue, 7 Jul 2015 19:43:35 +0800 -zhanghailiang <address@hidden> wrote: -On 2015/7/7 19:23, Igor Mammedov wrote: -On Mon, 6 Jul 2015 17:59:10 +0800 -zhanghailiang <address@hidden> wrote: -On 2015/7/6 16:45, Paolo Bonzini wrote: -On 06/07/2015 09:54, zhanghailiang wrote: -From host, we found that QEMU vcpu1 thread and vcpu7 thread were not -consuming any cpu (Should be in idle state), -All of VCPUs' stacks in host is like bellow: - -[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] -[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] -[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] -[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] -[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 -[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 -[<ffffffff81468092>] system_call_fastpath+0x16/0x1b -[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 -[<ffffffffffffffff>] 0xffffffffffffffff - -We looked into the kernel codes that could leading to the above 'Stuck' -warning, -in current upstream there isn't any printk(...Stuck...) left since that code -path -has been reworked. -I've often seen this on over-committed host during guest CPUs up/down torture -test. -Could you update guest kernel to upstream and see if issue reproduces? -Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to -reproduce it. - -For your test case, is it a kernel bug? -Or is there any related patch could solve your test problem been merged into -upstream ? -I don't remember all prerequisite patches but you should be able to find -http://marc.info/?l=linux-kernel&m=140326703108009&w=2 -"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" -and then look for dependencies. -Er, we have investigated this patch, and it is not related to our problem, :) - -Thanks. -Thanks, -zhanghailiang -and found that the only possible is the emulation of 'cpuid' instruct in -kvm/qemu has something wrong. -But since we canât reproduce this problem, we are not quite sure. -Is there any possible that the cupid emulation in kvm/qemu has some bug ? -Can you explain the relationship to the cpuid emulation? What do the -traces say about vcpus 1 and 7? -OK, we searched the VM's kernel codes with the 'Stuck' message, and it is -located in -do_boot_cpu(). It's in BSP context, the call process is: -BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() --> wakeup_secondary_via_INIT() to trigger APs. -It will wait 5s for APs to startup, if some AP not startup normally, it will -print 'CPU%d Stuck' or 'CPU%d: Not responding'. - -If it prints 'Stuck', it means the AP has received the SIPI interrupt and -begins to execute the code -'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before -smp_callin()(smpboot.c). -The follow is the starup process of BSP and AP. -BSP: -start_kernel() - ->smp_init() - ->smp_boot_cpus() - ->do_boot_cpu() - ->start_ip = trampoline_address(); //set the address that AP will -go to execute - ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU - ->for (timeout = 0; timeout < 50000; timeout++) - if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if -AP startup or not - -APs: -ENTRY(trampoline_data) (trampoline_64.S) - ->ENTRY(secondary_startup_64) (head_64.S) - ->start_secondary() (smpboot.c) - ->cpu_init(); - ->smp_callin(); - ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP -comes here, the BSP will not prints the error message. - - From above call process, we can be sure that, the AP has been stuck between -trampoline_data and the cpumask_set_cpu() in -smp_callin(), we look through these codes path carefully, and only found a -'hlt' instruct that could block the process. -It is located in trampoline_data(): - -ENTRY(trampoline_data) - ... - - call verify_cpu # Verify the cpu supports long mode - testl %eax, %eax # Check for return code - jnz no_longmode - - ... - -no_longmode: - hlt - jmp no_longmode - -For the process verify_cpu(), -we can only find the 'cpuid' sensitive instruct that could lead VM exit from -No-root mode. -This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to -the fail in verify_cpu. - - From the message in VM, we know vcpu1 and vcpu7 is something wrong. -[ 5.060042] CPU1: Stuck ?? -[ 10.170815] CPU7: Stuck ?? -[ 10.171648] Brought up 6 CPUs - -Besides, the follow is the cpus message got from host. -80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command -instance-0000000 -* CPU #0: pc=0x00007f64160c683d thread_id=68570 - CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 - CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 - CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 - CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 - CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 - CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 - CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 - -Oh, i also forgot to mention in the above message that, we have bond each vCPU -to different physical CPU in -host. - -Thanks, -zhanghailiang - - - - --- -To unsubscribe from this list: send the line "unsubscribe kvm" in -the body of a message to address@hidden -More majordomo info at -http://vger.kernel.org/majordomo-info.html -. -. - diff --git a/results/classifier/016/virtual/35170175 b/results/classifier/016/virtual/35170175 deleted file mode 100644 index 7ea10dab..00000000 --- a/results/classifier/016/virtual/35170175 +++ /dev/null @@ -1,548 +0,0 @@ -virtual: 0.801 -debug: 0.796 -x86: 0.144 -files: 0.086 -operating system: 0.076 -PID: 0.072 -TCG: 0.033 -register: 0.023 -kernel: 0.020 -assembly: 0.019 -i386: 0.018 -ppc: 0.013 -hypervisor: 0.013 -user-level: 0.008 -performance: 0.008 -semantic: 0.007 -device: 0.005 -architecture: 0.003 -arm: 0.003 -network: 0.003 -boot: 0.003 -VMM: 0.002 -graphic: 0.002 -permissions: 0.002 -peripherals: 0.002 -KVM: 0.002 -alpha: 0.001 -socket: 0.001 -risc-v: 0.001 -vnc: 0.001 -mistranslation: 0.001 - -[Qemu-devel] [BUG] QEMU crashes with dpdk virtio pmd - -Qemu crashes, with pre-condition: -vm xml config with multiqueue, and the vm's driver virtio-net support -multi-queue - -reproduce steps: -i. start dpdk testpmd in VM with the virtio nic -ii. stop testpmd -iii. reboot the VM - -This commit "f9d6dbf0 remove virtio queues if the guest doesn't support -multiqueue" is introduced. - -Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) -VM DPDK version: DPDK-1.6.1 - -Call Trace: -#0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 -#1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 -#2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 -#3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 -#4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 -#5 0x00007f6088fea32c in iter_remove_or_steal () from -/usr/lib64/libglib-2.0.so.0 -#6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at -qom/object.c:410 -#7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 -#8 object_unref (address@hidden) at qom/object.c:903 -#9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at -git/qemu/exec.c:1154 -#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 -#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514 -#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at -util/rcu.c:272 -#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 - -Call Trace: -#0 0x00007fdccaeb9790 in ?? () -#1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at -qom/object.c:405 -#2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 -#3 object_unref (address@hidden) at qom/object.c:903 -#4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at -git/qemu/exec.c:1154 -#5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 -#6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514 -#7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at -util/rcu.c:272 -#8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -#9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 - -The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio -queues -if the guest doesn't support multiqueue. But it might be still referenced by -others (eg . virtio_net_set_status()), -which need so set NULL. - -diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c -index 7d091c9..98bd683 100644 ---- a/hw/net/virtio-net.c -+++ b/hw/net/virtio-net.c -@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index) - if (q->tx_timer) { - timer_del(q->tx_timer); - timer_free(q->tx_timer); -+ q->tx_timer = NULL; - } else { - qemu_bh_delete(q->tx_bh); -+ q->tx_bh = NULL; - } -+ q->tx_waiting = 0; - virtio_del_queue(vdev, index * 2 + 1); - } - -From: wangyunjian -Sent: Monday, April 24, 2017 6:10 PM -To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' -<address@hidden> -Cc: wangyunjian <address@hidden>; caihe <address@hidden> -Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd - -Qemu crashes, with pre-condition: -vm xml config with multiqueue, and the vm's driver virtio-net support -multi-queue - -reproduce steps: -i. start dpdk testpmd in VM with the virtio nic -ii. stop testpmd -iii. reboot the VM - -This commit "f9d6dbf0 remove virtio queues if the guest doesn't support -multiqueue" is introduced. - -Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) -VM DPDK version:  DPDK-1.6.1 - -Call Trace: -#0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 -#1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 -#2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 -#3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 -#4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 -#5 0x00007f6088fea32c in iter_remove_or_steal () from -/usr/lib64/libglib-2.0.so.0 -#6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at -qom/object.c:410 -#7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 -#8 object_unref (address@hidden) at qom/object.c:903 -#9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at -git/qemu/exec.c:1154 -#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 -#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514 -#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at -util/rcu.c:272 -#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 - -Call Trace: -#0 0x00007fdccaeb9790 in ?? () -#1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at -qom/object.c:405 -#2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 -#3 object_unref (address@hidden) at qom/object.c:903 -#4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at -git/qemu/exec.c:1154 -#5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 -#6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514 -#7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at -util/rcu.c:272 -#8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -#9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 - -On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote: -The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio -queues -if the guest doesn't support multiqueue. But it might be still referenced by -others (eg . virtio_net_set_status()), -which need so set NULL. - -diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c -index 7d091c9..98bd683 100644 ---- a/hw/net/virtio-net.c -+++ b/hw/net/virtio-net.c -@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index) - if (q->tx_timer) { - timer_del(q->tx_timer); - timer_free(q->tx_timer); -+ q->tx_timer = NULL; - } else { - qemu_bh_delete(q->tx_bh); -+ q->tx_bh = NULL; - } -+ q->tx_waiting = 0; - virtio_del_queue(vdev, index * 2 + 1); - } -Thanks a lot for the fix. - -Two questions: -- If virtio_net_set_status() is the only function that may access tx_bh, -it looks like setting tx_waiting to zero is sufficient? -- Can you post a formal patch for this? - -Thanks -From: wangyunjian -Sent: Monday, April 24, 2017 6:10 PM -To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' -<address@hidden> -Cc: wangyunjian <address@hidden>; caihe <address@hidden> -Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd - -Qemu crashes, with pre-condition: -vm xml config with multiqueue, and the vm's driver virtio-net support -multi-queue - -reproduce steps: -i. start dpdk testpmd in VM with the virtio nic -ii. stop testpmd -iii. reboot the VM - -This commit "f9d6dbf0 remove virtio queues if the guest doesn't support -multiqueue" is introduced. - -Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) -VM DPDK version: DPDK-1.6.1 - -Call Trace: -#0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 -#1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 -#2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 -#3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 -#4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 -#5 0x00007f6088fea32c in iter_remove_or_steal () from -/usr/lib64/libglib-2.0.so.0 -#6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at -qom/object.c:410 -#7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 -#8 object_unref (address@hidden) at qom/object.c:903 -#9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at -git/qemu/exec.c:1154 -#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 -#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514 -#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at -util/rcu.c:272 -#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 - -Call Trace: -#0 0x00007fdccaeb9790 in ?? () -#1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at -qom/object.c:405 -#2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 -#3 object_unref (address@hidden) at qom/object.c:903 -#4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at -git/qemu/exec.c:1154 -#5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 -#6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514 -#7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at -util/rcu.c:272 -#8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -#9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 - -CCing Paolo and Stefan, since it has a relationship with bh in Qemu. - -> ------Original Message----- -> -From: Jason Wang [ -mailto:address@hidden -> -> -> -On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote: -> -> The q->tx_bh will free in virtio_net_del_queue() function, when remove -> -> virtio -> -queues -> -> if the guest doesn't support multiqueue. But it might be still referenced by -> -others (eg . virtio_net_set_status()), -> -> which need so set NULL. -> -> -> -> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c -> -> index 7d091c9..98bd683 100644 -> -> --- a/hw/net/virtio-net.c -> -> +++ b/hw/net/virtio-net.c -> -> @@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, -> -int index) -> -> if (q->tx_timer) { -> -> timer_del(q->tx_timer); -> -> timer_free(q->tx_timer); -> -> + q->tx_timer = NULL; -> -> } else { -> -> qemu_bh_delete(q->tx_bh); -> -> + q->tx_bh = NULL; -> -> } -> -> + q->tx_waiting = 0; -> -> virtio_del_queue(vdev, index * 2 + 1); -> -> } -> -> -Thanks a lot for the fix. -> -> -Two questions: -> -> -- If virtio_net_set_status() is the only function that may access tx_bh, -> -it looks like setting tx_waiting to zero is sufficient? -Currently yes, but we don't assure that it works for all scenarios, so -we set the tx_bh and tx_timer to NULL to avoid to possibly access wild pointer, -which is the common method for usage of bh in Qemu. - -I have another question about the root cause of this issure. - -This below trace is the path of setting tx_waiting to one in -virtio_net_handle_tx_bh() : - -Breakpoint 1, virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at -/data/wyj/git/qemu/hw/net/virtio-net.c:1398 -1398 { -(gdb) bt -#0 virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at -/data/wyj/git/qemu/hw/net/virtio-net.c:1398 -#1 0x00007f3357bddf9c in virtio_bus_set_host_notifier (bus=<optimized out>, -address@hidden, address@hidden) at hw/virtio/virtio-bus.c:297 -#2 0x00007f3357a0055d in vhost_dev_disable_notifiers (address@hidden, -address@hidden) at /data/wyj/git/qemu/hw/virtio/vhost.c:1422 -#3 0x00007f33579e3373 in vhost_net_stop_one (net=0x7f335ad84dc0, -dev=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/vhost_net.c:289 -#4 0x00007f33579e385b in vhost_net_stop (address@hidden, ncs=<optimized out>, -address@hidden) at /data/wyj/git/qemu/hw/net/vhost_net.c:367 -#5 0x00007f33579e15de in virtio_net_vhost_status (status=<optimized out>, -n=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/virtio-net.c:176 -#6 virtio_net_set_status (vdev=0x7f335c6f5f90, status=0 '\000') at -/data/wyj/git/qemu/hw/net/virtio-net.c:250 -#7 0x00007f33579f8dc6 in virtio_set_status (address@hidden, address@hidden -'\000') at /data/wyj/git/qemu/hw/virtio/virtio.c:1146 -#8 0x00007f3357bdd3cc in virtio_ioport_write (val=0, addr=18, -opaque=0x7f335c6eda80) at hw/virtio/virtio-pci.c:387 -#9 virtio_pci_config_write (opaque=0x7f335c6eda80, addr=18, val=0, -size=<optimized out>) at hw/virtio/virtio-pci.c:511 -#10 0x00007f33579b2155 in memory_region_write_accessor (mr=0x7f335c6ee470, -addr=18, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized -out>, attrs=...) at /data/wyj/git/qemu/memory.c:526 -#11 0x00007f33579af2e9 in access_with_adjusted_size (address@hidden, -address@hidden, address@hidden, access_size_min=<optimized out>, -access_size_max=<optimized out>, address@hidden - 0x7f33579b20f0 <memory_region_write_accessor>, address@hidden, -address@hidden) at /data/wyj/git/qemu/memory.c:592 -#12 0x00007f33579b2e15 in memory_region_dispatch_write (address@hidden, -address@hidden, data=0, address@hidden, address@hidden) at -/data/wyj/git/qemu/memory.c:1319 -#13 0x00007f335796cd93 in address_space_write_continue (mr=0x7f335c6ee470, l=1, -addr1=18, len=1, buf=0x7f335773d000 "", attrs=..., addr=49170, -as=0x7f3358317060 <address_space_io>) at /data/wyj/git/qemu/exec.c:2834 -#14 address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., -buf=<optimized out>, len=<optimized out>) at /data/wyj/git/qemu/exec.c:2879 -#15 0x00007f335796d3ad in address_space_rw (as=<optimized out>, address@hidden, -attrs=..., address@hidden, buf=<optimized out>, address@hidden, address@hidden) -at /data/wyj/git/qemu/exec.c:2981 -#16 0x00007f33579ae226 in kvm_handle_io (count=1, size=1, direction=<optimized -out>, data=<optimized out>, attrs=..., port=49170) at -/data/wyj/git/qemu/kvm-all.c:1803 -#17 kvm_cpu_exec (address@hidden) at /data/wyj/git/qemu/kvm-all.c:2032 -#18 0x00007f335799b632 in qemu_kvm_cpu_thread_fn (arg=0x7f335ae82070) at -/data/wyj/git/qemu/cpus.c:1118 -#19 0x00007f3352983dc5 in start_thread () from /usr/lib64/libpthread.so.0 -#20 0x00007f335113571d in clone () from /usr/lib64/libc.so.6 - -It calls qemu_bh_schedule(q->tx_bh) at the bottom of virtio_net_handle_tx_bh(), -I don't know why virtio_net_tx_bh() doesn't be invoked, so that the -q->tx_waiting is not zero. -[ps: we added logs in virtio_net_tx_bh() to verify that] - -Some other information: - -It won't crash if we don't use vhost-net. - - -Thanks, --Gonglei - -> -- Can you post a formal patch for this? -> -> -Thanks -> -> -> From: wangyunjian -> -> Sent: Monday, April 24, 2017 6:10 PM -> -> To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason -> -Wang' <address@hidden> -> -> Cc: wangyunjian <address@hidden>; caihe <address@hidden> -> -> Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd -> -> -> -> Qemu crashes, with pre-condition: -> -> vm xml config with multiqueue, and the vm's driver virtio-net support -> -multi-queue -> -> -> -> reproduce steps: -> -> i. start dpdk testpmd in VM with the virtio nic -> -> ii. stop testpmd -> -> iii. reboot the VM -> -> -> -> This commit "f9d6dbf0 remove virtio queues if the guest doesn't support -> -multiqueue" is introduced. -> -> -> -> Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) -> -> VM DPDK version: DPDK-1.6.1 -> -> -> -> Call Trace: -> -> #0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 -> -> #1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 -> -> #2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 -> -> #3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 -> -> #4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 -> -> #5 0x00007f6088fea32c in iter_remove_or_steal () from -> -/usr/lib64/libglib-2.0.so.0 -> -> #6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) -> -at qom/object.c:410 -> -> #7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 -> -> #8 object_unref (address@hidden) at qom/object.c:903 -> -> #9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at -> -git/qemu/exec.c:1154 -> -> #10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 -> -> #11 address_space_dispatch_free (d=0x7f6090b72b90) at -> -git/qemu/exec.c:2514 -> -> #12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at -> -util/rcu.c:272 -> -> #13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -> -> #14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 -> -> -> -> Call Trace: -> -> #0 0x00007fdccaeb9790 in ?? () -> -> #1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at -> -qom/object.c:405 -> -> #2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 -> -> #3 object_unref (address@hidden) at qom/object.c:903 -> -> #4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at -> -git/qemu/exec.c:1154 -> -> #5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 -> -> #6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at -> -git/qemu/exec.c:2514 -> -> #7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at -> -util/rcu.c:272 -> -> #8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 -> -> #9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 -> -> -> -> - -On 25/04/2017 14:02, Jason Wang wrote: -> -> -Thanks a lot for the fix. -> -> -Two questions: -> -> -- If virtio_net_set_status() is the only function that may access tx_bh, -> -it looks like setting tx_waiting to zero is sufficient? -I think clearing tx_bh is better anyway, as leaving a dangling pointer -is not very hygienic. - -Paolo - -> -- Can you post a formal patch for this? - diff --git a/results/classifier/016/virtual/36568044 b/results/classifier/016/virtual/36568044 deleted file mode 100644 index 4507a105..00000000 --- a/results/classifier/016/virtual/36568044 +++ /dev/null @@ -1,4608 +0,0 @@ -virtual: 0.897 -hypervisor: 0.757 -KVM: 0.717 -debug: 0.694 -socket: 0.541 -kernel: 0.452 -TCG: 0.366 -x86: 0.260 -network: 0.208 -register: 0.159 -operating system: 0.097 -device: 0.063 -PID: 0.049 -files: 0.034 -VMM: 0.031 -risc-v: 0.023 -assembly: 0.017 -ppc: 0.015 -alpha: 0.011 -peripherals: 0.010 -user-level: 0.009 -i386: 0.008 -semantic: 0.008 -performance: 0.007 -architecture: 0.007 -graphic: 0.004 -permissions: 0.004 -arm: 0.003 -vnc: 0.003 -boot: 0.003 -mistranslation: 0.001 - -[BUG, RFC] cpr-transfer: qxl guest driver crashes after migration - -Hi all, - -We've been experimenting with cpr-transfer migration mode recently and -have discovered the following issue with the guest QXL driver: - -Run migration source: -> -EMULATOR=/path/to/emulator -> -ROOTFS=/path/to/image -> -QMPSOCK=/var/run/alma8qmp-src.sock -> -> -$EMULATOR -enable-kvm \ -> --machine q35 \ -> --cpu host -smp 2 -m 2G \ -> --object -> -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ -> --machine memory-backend=ram0 \ -> --machine aux-ram-share=on \ -> --drive file=$ROOTFS,media=disk,if=virtio \ -> --qmp unix:$QMPSOCK,server=on,wait=off \ -> --nographic \ -> --device qxl-vga -Run migration target: -> -EMULATOR=/path/to/emulator -> -ROOTFS=/path/to/image -> -QMPSOCK=/var/run/alma8qmp-dst.sock -> -> -> -> -$EMULATOR -enable-kvm \ -> --machine q35 \ -> --cpu host -smp 2 -m 2G \ -> --object -> -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ -> --machine memory-backend=ram0 \ -> --machine aux-ram-share=on \ -> --drive file=$ROOTFS,media=disk,if=virtio \ -> --qmp unix:$QMPSOCK,server=on,wait=off \ -> --nographic \ -> --device qxl-vga \ -> --incoming tcp:0:44444 \ -> --incoming '{"channel-type": "cpr", "addr": { "transport": "socket", -> -"type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -> -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -> -QMPSOCK=/var/run/alma8qmp-src.sock -> -> -$QMPSHELL -p $QMPSOCK <<EOF -> -migrate-set-parameters mode=cpr-transfer -> -migrate -> -channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}] -> -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -> -[ 73.962002] [TTM] Buffer eviction failed -> -[ 73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001) -> -[ 73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate -> -VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which speeds up -the crash in the guest): -> -#!/bin/bash -> -> -chvt 3 -> -> -for j in $(seq 80); do -> -echo "$(date) starting round $j" -> -if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" -> -]; then -> -echo "bug was reproduced after $j tries" -> -exit 1 -> -fi -> -for i in $(seq 100); do -> -dmesg > /dev/tty3 -> -done -> -done -> -> -echo "bug could not be reproduced" -> -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into this? Any -suggestions would be appreciated. Thanks! - -Andrey - -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ - -machine q35 \ - -cpu host -smp 2 -m 2G \ - -object -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ - -machine memory-backend=ram0 \ - -machine aux-ram-share=on \ - -drive file=$ROOTFS,media=disk,if=virtio \ - -qmp unix:$QMPSOCK,server=on,wait=off \ - -nographic \ - -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ --machine q35 \ - -cpu host -smp 2 -m 2G \ - -object -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ - -machine memory-backend=ram0 \ - -machine aux-ram-share=on \ - -drive file=$ROOTFS,media=disk,if=virtio \ - -qmp unix:$QMPSOCK,server=on,wait=off \ - -nographic \ - -device qxl-vga \ - -incoming tcp:0:44444 \ - -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", -"path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF - migrate-set-parameters mode=cpr-transfer - migrate -channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -[ 73.962002] [TTM] Buffer eviction failed -[ 73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001) -[ 73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate -VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do - echo "$(date) starting round $j" - if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" -]; then - echo "bug was reproduced after $j tries" - exit 1 - fi - for i in $(seq 100); do - dmesg > /dev/tty3 - done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' - -- Steve - -On 2/28/2025 1:13 PM, Steven Sistare wrote: -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ -    -machine q35 \ -    -cpu host -smp 2 -m 2G \ -    -object -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ -    -machine memory-backend=ram0 \ -    -machine aux-ram-share=on \ -    -drive file=$ROOTFS,media=disk,if=virtio \ -    -qmp unix:$QMPSOCK,server=on,wait=off \ -    -nographic \ -    -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ -    -machine q35 \ -    -cpu host -smp 2 -m 2G \ -    -object -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ -    -machine memory-backend=ram0 \ -    -machine aux-ram-share=on \ -    -drive file=$ROOTFS,media=disk,if=virtio \ -    -qmp unix:$QMPSOCK,server=on,wait=off \ -    -nographic \ -    -device qxl-vga \ -    -incoming tcp:0:44444 \ -    -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", -"path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF -    migrate-set-parameters mode=cpr-transfer -    migrate -channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -[  73.962002] [TTM] Buffer eviction failed -[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001) -[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate -VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do -        echo "$(date) starting round $j" -        if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" -]; then -                echo "bug was reproduced after $j tries" -                exit 1 -        fi -        for i in $(seq 100); do -                dmesg > /dev/tty3 -        done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' -Also try adding this patch to see if it flags any ram blocks as not -compatible with cpr. A message is printed at migration start time. -1740667681-257312-1-git-send-email-steven.sistare@oracle.com -/">https://lore.kernel.org/qemu-devel/ -1740667681-257312-1-git-send-email-steven.sistare@oracle.com -/ -- Steve - -On 2/28/25 8:20 PM, Steven Sistare wrote: -> -On 2/28/2025 1:13 PM, Steven Sistare wrote: -> -> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -> ->> Hi all, -> ->> -> ->> We've been experimenting with cpr-transfer migration mode recently and -> ->> have discovered the following issue with the guest QXL driver: -> ->> -> ->> Run migration source: -> ->>> EMULATOR=/path/to/emulator -> ->>> ROOTFS=/path/to/image -> ->>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>> -> ->>> $EMULATOR -enable-kvm \ -> ->>>     -machine q35 \ -> ->>>     -cpu host -smp 2 -m 2G \ -> ->>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -> ->>> ram0,share=on\ -> ->>>     -machine memory-backend=ram0 \ -> ->>>     -machine aux-ram-share=on \ -> ->>>     -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>     -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>     -nographic \ -> ->>>     -device qxl-vga -> ->> -> ->> Run migration target: -> ->>> EMULATOR=/path/to/emulator -> ->>> ROOTFS=/path/to/image -> ->>> QMPSOCK=/var/run/alma8qmp-dst.sock -> ->>> $EMULATOR -enable-kvm \ -> ->>>     -machine q35 \ -> ->>>     -cpu host -smp 2 -m 2G \ -> ->>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -> ->>> ram0,share=on\ -> ->>>     -machine memory-backend=ram0 \ -> ->>>     -machine aux-ram-share=on \ -> ->>>     -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>     -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>     -nographic \ -> ->>>     -device qxl-vga \ -> ->>>     -incoming tcp:0:44444 \ -> ->>>     -incoming '{"channel-type": "cpr", "addr": { "transport": -> ->>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -> ->> -> ->> -> ->> Launch the migration: -> ->>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -> ->>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>> -> ->>> $QMPSHELL -p $QMPSOCK <<EOF -> ->>>     migrate-set-parameters mode=cpr-transfer -> ->>>     migrate channels=[{"channel-type":"main","addr": -> ->>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, -> ->>> {"channel-type":"cpr","addr": -> ->>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -> ->>> dst.sock"}}] -> ->>> EOF -> ->> -> ->> Then, after a while, QXL guest driver on target crashes spewing the -> ->> following messages: -> ->>> [  73.962002] [TTM] Buffer eviction failed -> ->>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -> ->>> 0x00000001) -> ->>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -> ->>> allocate VRAM BO -> ->> -> ->> That seems to be a known kernel QXL driver bug: -> ->> -> ->> -https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ -> ->> -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -> ->> -> ->> (the latter discussion contains that reproduce script which speeds up -> ->> the crash in the guest): -> ->>> #!/bin/bash -> ->>> -> ->>> chvt 3 -> ->>> -> ->>> for j in $(seq 80); do -> ->>>         echo "$(date) starting round $j" -> ->>>         if [ "$(journalctl --boot | grep "failed to allocate VRAM -> ->>> BO")" != "" ]; then -> ->>>                 echo "bug was reproduced after $j tries" -> ->>>                 exit 1 -> ->>>         fi -> ->>>         for i in $(seq 100); do -> ->>>                 dmesg > /dev/tty3 -> ->>>         done -> ->>> done -> ->>> -> ->>> echo "bug could not be reproduced" -> ->>> exit 0 -> ->> -> ->> The bug itself seems to remain unfixed, as I was able to reproduce that -> ->> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -> ->> cpr-transfer code also seems to be buggy as it triggers the crash - -> ->> without the cpr-transfer migration the above reproduce doesn't lead to -> ->> crash on the source VM. -> ->> -> ->> I suspect that, as cpr-transfer doesn't migrate the guest memory, but -> ->> rather passes it through the memory backend object, our code might -> ->> somehow corrupt the VRAM. However, I wasn't able to trace the -> ->> corruption so far. -> ->> -> ->> Could somebody help the investigation and take a look into this? Any -> ->> suggestions would be appreciated. Thanks! -> -> -> -> Possibly some memory region created by qxl is not being preserved. -> -> Try adding these traces to see what is preserved: -> -> -> -> -trace enable='*cpr*' -> -> -trace enable='*ram_alloc*' -> -> -Also try adding this patch to see if it flags any ram blocks as not -> -compatible with cpr. A message is printed at migration start time. -> - -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email- -> -steven.sistare@oracle.com/ -> -> -- Steve -> -With the traces enabled + the "migration: ram block cpr blockers" patch -applied: - -Source: -> -cpr_find_fd pc.bios, id 0 returns -1 -> -cpr_save_fd pc.bios, id 0, fd 22 -> -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -> -0x7fec18e00000 -> -cpr_find_fd pc.rom, id 0 returns -1 -> -cpr_save_fd pc.rom, id 0, fd 23 -> -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -> -0x7fec18c00000 -> -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -> -cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -> -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd -> -24 host 0x7fec18a00000 -> -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -> -cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -> -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 -> -fd 25 host 0x7feb77e00000 -> -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -> -cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -> -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 -> -host 0x7fec18800000 -> -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -> -cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -> -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 -> -fd 28 host 0x7feb73c00000 -> -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -> -cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -> -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 -> -host 0x7fec18600000 -> -cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -> -cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -> -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35 -> -host 0x7fec18200000 -> -cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -> -cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -> -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 -> -host 0x7feb8b600000 -> -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -> -cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -> -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host -> -0x7feb8b400000 -> -> -cpr_state_save cpr-transfer mode -> -cpr_transfer_output /var/run/alma8cpr-dst.sock -Target: -> -cpr_transfer_input /var/run/alma8cpr-dst.sock -> -cpr_state_load cpr-transfer mode -> -cpr_find_fd pc.bios, id 0 returns 20 -> -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -> -0x7fcdc9800000 -> -cpr_find_fd pc.rom, id 0 returns 19 -> -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -> -0x7fcdc9600000 -> -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -> -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd -> -18 host 0x7fcdc9400000 -> -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -> -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 -> -fd 17 host 0x7fcd27e00000 -> -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -> -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 -> -host 0x7fcdc9200000 -> -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -> -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 -> -fd 15 host 0x7fcd23c00000 -> -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -> -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 -> -host 0x7fcdc8800000 -> -cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -> -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13 -> -host 0x7fcdc8400000 -> -cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -> -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 -> -host 0x7fcdc8200000 -> -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -> -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host -> -0x7fcd3be00000 -Looks like both vga.vram and qxl.vram are being preserved (with the same -addresses), and no incompatible ram blocks are found during migration. - -Andrey - -On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -> -On 2/28/25 8:20 PM, Steven Sistare wrote: -> -> On 2/28/2025 1:13 PM, Steven Sistare wrote: -> ->> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -> ->>> Hi all, -> ->>> -> ->>> We've been experimenting with cpr-transfer migration mode recently and -> ->>> have discovered the following issue with the guest QXL driver: -> ->>> -> ->>> Run migration source: -> ->>>> EMULATOR=/path/to/emulator -> ->>>> ROOTFS=/path/to/image -> ->>>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>>> -> ->>>> $EMULATOR -enable-kvm \ -> ->>>>     -machine q35 \ -> ->>>>     -cpu host -smp 2 -m 2G \ -> ->>>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -> ->>>> ram0,share=on\ -> ->>>>     -machine memory-backend=ram0 \ -> ->>>>     -machine aux-ram-share=on \ -> ->>>>     -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>>     -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>>     -nographic \ -> ->>>>     -device qxl-vga -> ->>> -> ->>> Run migration target: -> ->>>> EMULATOR=/path/to/emulator -> ->>>> ROOTFS=/path/to/image -> ->>>> QMPSOCK=/var/run/alma8qmp-dst.sock -> ->>>> $EMULATOR -enable-kvm \ -> ->>>>     -machine q35 \ -> ->>>>     -cpu host -smp 2 -m 2G \ -> ->>>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -> ->>>> ram0,share=on\ -> ->>>>     -machine memory-backend=ram0 \ -> ->>>>     -machine aux-ram-share=on \ -> ->>>>     -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>>     -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>>     -nographic \ -> ->>>>     -device qxl-vga \ -> ->>>>     -incoming tcp:0:44444 \ -> ->>>>     -incoming '{"channel-type": "cpr", "addr": { "transport": -> ->>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -> ->>> -> ->>> -> ->>> Launch the migration: -> ->>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -> ->>>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>>> -> ->>>> $QMPSHELL -p $QMPSOCK <<EOF -> ->>>>     migrate-set-parameters mode=cpr-transfer -> ->>>>     migrate channels=[{"channel-type":"main","addr": -> ->>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, -> ->>>> {"channel-type":"cpr","addr": -> ->>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -> ->>>> dst.sock"}}] -> ->>>> EOF -> ->>> -> ->>> Then, after a while, QXL guest driver on target crashes spewing the -> ->>> following messages: -> ->>>> [  73.962002] [TTM] Buffer eviction failed -> ->>>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -> ->>>> 0x00000001) -> ->>>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -> ->>>> allocate VRAM BO -> ->>> -> ->>> That seems to be a known kernel QXL driver bug: -> ->>> -> ->>> -https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ -> ->>> -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -> ->>> -> ->>> (the latter discussion contains that reproduce script which speeds up -> ->>> the crash in the guest): -> ->>>> #!/bin/bash -> ->>>> -> ->>>> chvt 3 -> ->>>> -> ->>>> for j in $(seq 80); do -> ->>>>         echo "$(date) starting round $j" -> ->>>>         if [ "$(journalctl --boot | grep "failed to allocate VRAM -> ->>>> BO")" != "" ]; then -> ->>>>                 echo "bug was reproduced after $j tries" -> ->>>>                 exit 1 -> ->>>>         fi -> ->>>>         for i in $(seq 100); do -> ->>>>                 dmesg > /dev/tty3 -> ->>>>         done -> ->>>> done -> ->>>> -> ->>>> echo "bug could not be reproduced" -> ->>>> exit 0 -> ->>> -> ->>> The bug itself seems to remain unfixed, as I was able to reproduce that -> ->>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -> ->>> cpr-transfer code also seems to be buggy as it triggers the crash - -> ->>> without the cpr-transfer migration the above reproduce doesn't lead to -> ->>> crash on the source VM. -> ->>> -> ->>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but -> ->>> rather passes it through the memory backend object, our code might -> ->>> somehow corrupt the VRAM. However, I wasn't able to trace the -> ->>> corruption so far. -> ->>> -> ->>> Could somebody help the investigation and take a look into this? Any -> ->>> suggestions would be appreciated. Thanks! -> ->> -> ->> Possibly some memory region created by qxl is not being preserved. -> ->> Try adding these traces to see what is preserved: -> ->> -> ->> -trace enable='*cpr*' -> ->> -trace enable='*ram_alloc*' -> -> -> -> Also try adding this patch to see if it flags any ram blocks as not -> -> compatible with cpr. A message is printed at migration start time. -> ->  -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email- -> -> steven.sistare@oracle.com/ -> -> -> -> - Steve -> -> -> -> -With the traces enabled + the "migration: ram block cpr blockers" patch -> -applied: -> -> -Source: -> -> cpr_find_fd pc.bios, id 0 returns -1 -> -> cpr_save_fd pc.bios, id 0, fd 22 -> -> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -> -> 0x7fec18e00000 -> -> cpr_find_fd pc.rom, id 0 returns -1 -> -> cpr_save_fd pc.rom, id 0, fd 23 -> -> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -> -> 0x7fec18c00000 -> -> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -> -> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -> -> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd -> -> 24 host 0x7fec18a00000 -> -> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -> -> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -> -> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 -> -> fd 25 host 0x7feb77e00000 -> -> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -> -> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -> -> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 -> -> host 0x7fec18800000 -> -> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -> -> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -> -> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 -> -> fd 28 host 0x7feb73c00000 -> -> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -> -> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -> -> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 -> -> host 0x7fec18600000 -> -> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -> -> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -> -> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd -> -> 35 host 0x7fec18200000 -> -> cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -> -> cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -> -> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 -> -> host 0x7feb8b600000 -> -> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -> -> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -> -> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host -> -> 0x7feb8b400000 -> -> -> -> cpr_state_save cpr-transfer mode -> -> cpr_transfer_output /var/run/alma8cpr-dst.sock -> -> -Target: -> -> cpr_transfer_input /var/run/alma8cpr-dst.sock -> -> cpr_state_load cpr-transfer mode -> -> cpr_find_fd pc.bios, id 0 returns 20 -> -> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -> -> 0x7fcdc9800000 -> -> cpr_find_fd pc.rom, id 0 returns 19 -> -> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -> -> 0x7fcdc9600000 -> -> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -> -> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd -> -> 18 host 0x7fcdc9400000 -> -> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -> -> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 -> -> fd 17 host 0x7fcd27e00000 -> -> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -> -> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 -> -> host 0x7fcdc9200000 -> -> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -> -> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 -> -> fd 15 host 0x7fcd23c00000 -> -> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -> -> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 -> -> host 0x7fcdc8800000 -> -> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -> -> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd -> -> 13 host 0x7fcdc8400000 -> -> cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -> -> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 -> -> host 0x7fcdc8200000 -> -> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -> -> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host -> -> 0x7fcd3be00000 -> -> -Looks like both vga.vram and qxl.vram are being preserved (with the same -> -addresses), and no incompatible ram blocks are found during migration. -> -Sorry, addressed are not the same, of course. However corresponding ram -blocks do seem to be preserved and initialized. - -On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -On 2/28/25 8:20 PM, Steven Sistare wrote: -On 2/28/2025 1:13 PM, Steven Sistare wrote: -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ -     -machine q35 \ -     -cpu host -smp 2 -m 2G \ -     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -ram0,share=on\ -     -machine memory-backend=ram0 \ -     -machine aux-ram-share=on \ -     -drive file=$ROOTFS,media=disk,if=virtio \ -     -qmp unix:$QMPSOCK,server=on,wait=off \ -     -nographic \ -     -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ -     -machine q35 \ -     -cpu host -smp 2 -m 2G \ -     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -ram0,share=on\ -     -machine memory-backend=ram0 \ -     -machine aux-ram-share=on \ -     -drive file=$ROOTFS,media=disk,if=virtio \ -     -qmp unix:$QMPSOCK,server=on,wait=off \ -     -nographic \ -     -device qxl-vga \ -     -incoming tcp:0:44444 \ -     -incoming '{"channel-type": "cpr", "addr": { "transport": -"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF -     migrate-set-parameters mode=cpr-transfer -     migrate channels=[{"channel-type":"main","addr": -{"transport":"socket","type":"inet","host":"0","port":"44444"}}, -{"channel-type":"cpr","addr": -{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -[  73.962002] [TTM] Buffer eviction failed -[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -0x00000001) -[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -allocate VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do -         echo "$(date) starting round $j" -         if [ "$(journalctl --boot | grep "failed to allocate VRAM -BO")" != "" ]; then -                 echo "bug was reproduced after $j tries" -                 exit 1 -         fi -         for i in $(seq 100); do -                 dmesg > /dev/tty3 -         done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' -Also try adding this patch to see if it flags any ram blocks as not -compatible with cpr. A message is printed at migration start time. -  -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email- -steven.sistare@oracle.com/ - -- Steve -With the traces enabled + the "migration: ram block cpr blockers" patch -applied: - -Source: -cpr_find_fd pc.bios, id 0 returns -1 -cpr_save_fd pc.bios, id 0, fd 22 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -0x7fec18e00000 -cpr_find_fd pc.rom, id 0 returns -1 -cpr_save_fd pc.rom, id 0, fd 23 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -0x7fec18c00000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 24 -host 0x7fec18a00000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd -25 host 0x7feb77e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 host -0x7fec18800000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd -28 host 0x7feb73c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 host -0x7fec18600000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35 -host 0x7fec18200000 -cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 host -0x7feb8b600000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host -0x7feb8b400000 - -cpr_state_save cpr-transfer mode -cpr_transfer_output /var/run/alma8cpr-dst.sock -Target: -cpr_transfer_input /var/run/alma8cpr-dst.sock -cpr_state_load cpr-transfer mode -cpr_find_fd pc.bios, id 0 returns 20 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -0x7fcdc9800000 -cpr_find_fd pc.rom, id 0 returns 19 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -0x7fcdc9600000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 18 -host 0x7fcdc9400000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd -17 host 0x7fcd27e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 host -0x7fcdc9200000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd -15 host 0x7fcd23c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 host -0x7fcdc8800000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13 -host 0x7fcdc8400000 -cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 host -0x7fcdc8200000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host -0x7fcd3be00000 -Looks like both vga.vram and qxl.vram are being preserved (with the same -addresses), and no incompatible ram blocks are found during migration. -Sorry, addressed are not the same, of course. However corresponding ram -blocks do seem to be preserved and initialized. -So far, I have not reproduced the guest driver failure. - -However, I have isolated places where new QEMU improperly writes to -the qxl memory regions prior to starting the guest, by mmap'ing them -readonly after cpr: - - qemu_ram_alloc_internal() - if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) - ram_flags |= RAM_READONLY; - new_block = qemu_ram_alloc_from_fd(...) - -I have attached a draft fix; try it and let me know. -My console window looks fine before and after cpr, using --vnc $hostip:0 -vga qxl - -- Steve -0001-hw-qxl-cpr-support-preliminary.patch -Description: -Text document - -On 3/4/25 9:05 PM, Steven Sistare wrote: -> -On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -> -> On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -> ->> On 2/28/25 8:20 PM, Steven Sistare wrote: -> ->>> On 2/28/2025 1:13 PM, Steven Sistare wrote: -> ->>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -> ->>>>> Hi all, -> ->>>>> -> ->>>>> We've been experimenting with cpr-transfer migration mode recently -> ->>>>> and -> ->>>>> have discovered the following issue with the guest QXL driver: -> ->>>>> -> ->>>>> Run migration source: -> ->>>>>> EMULATOR=/path/to/emulator -> ->>>>>> ROOTFS=/path/to/image -> ->>>>>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>>>>> -> ->>>>>> $EMULATOR -enable-kvm \ -> ->>>>>>      -machine q35 \ -> ->>>>>>      -cpu host -smp 2 -m 2G \ -> ->>>>>>      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -> ->>>>>> ram0,share=on\ -> ->>>>>>      -machine memory-backend=ram0 \ -> ->>>>>>      -machine aux-ram-share=on \ -> ->>>>>>      -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>>>>      -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>>>>      -nographic \ -> ->>>>>>      -device qxl-vga -> ->>>>> -> ->>>>> Run migration target: -> ->>>>>> EMULATOR=/path/to/emulator -> ->>>>>> ROOTFS=/path/to/image -> ->>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock -> ->>>>>> $EMULATOR -enable-kvm \ -> ->>>>>>      -machine q35 \ -> ->>>>>>      -cpu host -smp 2 -m 2G \ -> ->>>>>>      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -> ->>>>>> ram0,share=on\ -> ->>>>>>      -machine memory-backend=ram0 \ -> ->>>>>>      -machine aux-ram-share=on \ -> ->>>>>>      -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>>>>      -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>>>>      -nographic \ -> ->>>>>>      -device qxl-vga \ -> ->>>>>>      -incoming tcp:0:44444 \ -> ->>>>>>      -incoming '{"channel-type": "cpr", "addr": { "transport": -> ->>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -> ->>>>> -> ->>>>> -> ->>>>> Launch the migration: -> ->>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -> ->>>>>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>>>>> -> ->>>>>> $QMPSHELL -p $QMPSOCK <<EOF -> ->>>>>>      migrate-set-parameters mode=cpr-transfer -> ->>>>>>      migrate channels=[{"channel-type":"main","addr": -> ->>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, -> ->>>>>> {"channel-type":"cpr","addr": -> ->>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -> ->>>>>> dst.sock"}}] -> ->>>>>> EOF -> ->>>>> -> ->>>>> Then, after a while, QXL guest driver on target crashes spewing the -> ->>>>> following messages: -> ->>>>>> [  73.962002] [TTM] Buffer eviction failed -> ->>>>>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -> ->>>>>> 0x00000001) -> ->>>>>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -> ->>>>>> allocate VRAM BO -> ->>>>> -> ->>>>> That seems to be a known kernel QXL driver bug: -> ->>>>> -> ->>>>> -https://lore.kernel.org/all/20220907094423.93581-1- -> ->>>>> min_halo@163.com/T/ -> ->>>>> -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -> ->>>>> -> ->>>>> (the latter discussion contains that reproduce script which speeds up -> ->>>>> the crash in the guest): -> ->>>>>> #!/bin/bash -> ->>>>>> -> ->>>>>> chvt 3 -> ->>>>>> -> ->>>>>> for j in $(seq 80); do -> ->>>>>>          echo "$(date) starting round $j" -> ->>>>>>          if [ "$(journalctl --boot | grep "failed to allocate VRAM -> ->>>>>> BO")" != "" ]; then -> ->>>>>>                  echo "bug was reproduced after $j tries" -> ->>>>>>                  exit 1 -> ->>>>>>          fi -> ->>>>>>          for i in $(seq 100); do -> ->>>>>>                  dmesg > /dev/tty3 -> ->>>>>>          done -> ->>>>>> done -> ->>>>>> -> ->>>>>> echo "bug could not be reproduced" -> ->>>>>> exit 0 -> ->>>>> -> ->>>>> The bug itself seems to remain unfixed, as I was able to reproduce -> ->>>>> that -> ->>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -> ->>>>> cpr-transfer code also seems to be buggy as it triggers the crash - -> ->>>>> without the cpr-transfer migration the above reproduce doesn't -> ->>>>> lead to -> ->>>>> crash on the source VM. -> ->>>>> -> ->>>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but -> ->>>>> rather passes it through the memory backend object, our code might -> ->>>>> somehow corrupt the VRAM. However, I wasn't able to trace the -> ->>>>> corruption so far. -> ->>>>> -> ->>>>> Could somebody help the investigation and take a look into this? Any -> ->>>>> suggestions would be appreciated. Thanks! -> ->>>> -> ->>>> Possibly some memory region created by qxl is not being preserved. -> ->>>> Try adding these traces to see what is preserved: -> ->>>> -> ->>>> -trace enable='*cpr*' -> ->>>> -trace enable='*ram_alloc*' -> ->>> -> ->>> Also try adding this patch to see if it flags any ram blocks as not -> ->>> compatible with cpr. A message is printed at migration start time. -> ->>>   -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- -> ->>> email- -> ->>> steven.sistare@oracle.com/ -> ->>> -> ->>> - Steve -> ->>> -> ->> -> ->> With the traces enabled + the "migration: ram block cpr blockers" patch -> ->> applied: -> ->> -> ->> Source: -> ->>> cpr_find_fd pc.bios, id 0 returns -1 -> ->>> cpr_save_fd pc.bios, id 0, fd 22 -> ->>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -> ->>> 0x7fec18e00000 -> ->>> cpr_find_fd pc.rom, id 0 returns -1 -> ->>> cpr_save_fd pc.rom, id 0, fd 23 -> ->>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -> ->>> 0x7fec18c00000 -> ->>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -> ->>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -> ->>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -> ->>> 262144 fd 24 host 0x7fec18a00000 -> ->>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -> ->>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -> ->>> 67108864 fd 25 host 0x7feb77e00000 -> ->>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -> ->>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -> ->>> fd 27 host 0x7fec18800000 -> ->>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -> ->>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -> ->>> 67108864 fd 28 host 0x7feb73c00000 -> ->>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -> ->>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -> ->>> fd 34 host 0x7fec18600000 -> ->>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -> ->>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -> ->>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -> ->>> 2097152 fd 35 host 0x7fec18200000 -> ->>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -> ->>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -> ->>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -> ->>> fd 36 host 0x7feb8b600000 -> ->>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -> ->>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -> ->>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -> ->>> 37 host 0x7feb8b400000 -> ->>> -> ->>> cpr_state_save cpr-transfer mode -> ->>> cpr_transfer_output /var/run/alma8cpr-dst.sock -> ->> -> ->> Target: -> ->>> cpr_transfer_input /var/run/alma8cpr-dst.sock -> ->>> cpr_state_load cpr-transfer mode -> ->>> cpr_find_fd pc.bios, id 0 returns 20 -> ->>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -> ->>> 0x7fcdc9800000 -> ->>> cpr_find_fd pc.rom, id 0 returns 19 -> ->>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -> ->>> 0x7fcdc9600000 -> ->>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -> ->>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -> ->>> 262144 fd 18 host 0x7fcdc9400000 -> ->>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -> ->>> 67108864 fd 17 host 0x7fcd27e00000 -> ->>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -> ->>> fd 16 host 0x7fcdc9200000 -> ->>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -> ->>> 67108864 fd 15 host 0x7fcd23c00000 -> ->>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -> ->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -> ->>> fd 14 host 0x7fcdc8800000 -> ->>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -> ->>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -> ->>> 2097152 fd 13 host 0x7fcdc8400000 -> ->>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -> ->>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -> ->>> fd 11 host 0x7fcdc8200000 -> ->>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -> ->>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -> ->>> 10 host 0x7fcd3be00000 -> ->> -> ->> Looks like both vga.vram and qxl.vram are being preserved (with the same -> ->> addresses), and no incompatible ram blocks are found during migration. -> -> -> -> Sorry, addressed are not the same, of course. However corresponding ram -> -> blocks do seem to be preserved and initialized. -> -> -So far, I have not reproduced the guest driver failure. -> -> -However, I have isolated places where new QEMU improperly writes to -> -the qxl memory regions prior to starting the guest, by mmap'ing them -> -readonly after cpr: -> -> - qemu_ram_alloc_internal() -> -   if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) -> -       ram_flags |= RAM_READONLY; -> -   new_block = qemu_ram_alloc_from_fd(...) -> -> -I have attached a draft fix; try it and let me know. -> -My console window looks fine before and after cpr, using -> --vnc $hostip:0 -vga qxl -> -> -- Steve -Regarding the reproduce: when I launch the buggy version with the same -options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, -my VNC client silently hangs on the target after a while. Could it -happen on your stand as well? Could you try launching VM with -"-nographic -device qxl-vga"? That way VM's serial console is given you -directly in the shell, so when qxl driver crashes you're still able to -inspect the kernel messages. - -As for your patch, I can report that it doesn't resolve the issue as it -is. But I was able to track down another possible memory corruption -using your approach with readonly mmap'ing: - -> -Program terminated with signal SIGSEGV, Segmentation fault. -> -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -> -412 d->ram->magic = cpu_to_le32(QXL_RAM_MAGIC); -> -[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] -> -(gdb) bt -> -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -> -#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, -> -errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 -> -#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, -> -errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 -> -#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, -> -errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 -> -#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, -> -value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 -> -#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, -> -v=0x5638996f3770, name=0x56389759b141 "realized", opaque=0x5638987893d0, -> -errp=0x7ffd3c2b84e0) -> -at ../qom/object.c:2374 -> -#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, -> -name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) -> -at ../qom/object.c:1449 -> -#7 0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70, -> -name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0) -> -at ../qom/qom-qobject.c:28 -> -#8 0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70, -> -name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0) -> -at ../qom/object.c:1519 -> -#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, -> -bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 -> -#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50, -> -from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714 -> -#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, -> -errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 -> -#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150, -> -errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207 -> -#13 0x000056389737a6cc in qemu_opts_foreach -> -(list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca -> -<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) -> -at ../util/qemu-option.c:1135 -> -#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745 -> -#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 -> -<error_fatal>) at ../system/vl.c:2806 -> -#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at -> -../system/vl.c:3838 -> -#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at -> -../system/main.c:72 -So the attached adjusted version of your patch does seem to help. At -least I can't reproduce the crash on my stand. - -I'm wondering, could it be useful to explicitly mark all the reused -memory regions readonly upon cpr-transfer, and then make them writable -back again after the migration is done? That way we will be segfaulting -early on instead of debugging tricky memory corruptions. - -Andrey -0001-hw-qxl-cpr-support-preliminary.patch -Description: -Text Data - -On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: -On 3/4/25 9:05 PM, Steven Sistare wrote: -On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -On 2/28/25 8:20 PM, Steven Sistare wrote: -On 2/28/2025 1:13 PM, Steven Sistare wrote: -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently -and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ -      -machine q35 \ -      -cpu host -smp 2 -m 2G \ -      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -ram0,share=on\ -      -machine memory-backend=ram0 \ -      -machine aux-ram-share=on \ -      -drive file=$ROOTFS,media=disk,if=virtio \ -      -qmp unix:$QMPSOCK,server=on,wait=off \ -      -nographic \ -      -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ -      -machine q35 \ -      -cpu host -smp 2 -m 2G \ -      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -ram0,share=on\ -      -machine memory-backend=ram0 \ -      -machine aux-ram-share=on \ -      -drive file=$ROOTFS,media=disk,if=virtio \ -      -qmp unix:$QMPSOCK,server=on,wait=off \ -      -nographic \ -      -device qxl-vga \ -      -incoming tcp:0:44444 \ -      -incoming '{"channel-type": "cpr", "addr": { "transport": -"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF -      migrate-set-parameters mode=cpr-transfer -      migrate channels=[{"channel-type":"main","addr": -{"transport":"socket","type":"inet","host":"0","port":"44444"}}, -{"channel-type":"cpr","addr": -{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -[  73.962002] [TTM] Buffer eviction failed -[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -0x00000001) -[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -allocate VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1- -min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do -          echo "$(date) starting round $j" -          if [ "$(journalctl --boot | grep "failed to allocate VRAM -BO")" != "" ]; then -                  echo "bug was reproduced after $j tries" -                  exit 1 -          fi -          for i in $(seq 100); do -                  dmesg > /dev/tty3 -          done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce -that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't -lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' -Also try adding this patch to see if it flags any ram blocks as not -compatible with cpr. A message is printed at migration start time. -   -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- -email- -steven.sistare@oracle.com/ - -- Steve -With the traces enabled + the "migration: ram block cpr blockers" patch -applied: - -Source: -cpr_find_fd pc.bios, id 0 returns -1 -cpr_save_fd pc.bios, id 0, fd 22 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -0x7fec18e00000 -cpr_find_fd pc.rom, id 0 returns -1 -cpr_save_fd pc.rom, id 0, fd 23 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -0x7fec18c00000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 24 host 0x7fec18a00000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 25 host 0x7feb77e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 27 host 0x7fec18800000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 28 host 0x7feb73c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 34 host 0x7fec18600000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 35 host 0x7fec18200000 -cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 36 host 0x7feb8b600000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -37 host 0x7feb8b400000 - -cpr_state_save cpr-transfer mode -cpr_transfer_output /var/run/alma8cpr-dst.sock -Target: -cpr_transfer_input /var/run/alma8cpr-dst.sock -cpr_state_load cpr-transfer mode -cpr_find_fd pc.bios, id 0 returns 20 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -0x7fcdc9800000 -cpr_find_fd pc.rom, id 0 returns 19 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -0x7fcdc9600000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 18 host 0x7fcdc9400000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 17 host 0x7fcd27e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 16 host 0x7fcdc9200000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 15 host 0x7fcd23c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 14 host 0x7fcdc8800000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 13 host 0x7fcdc8400000 -cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 11 host 0x7fcdc8200000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -10 host 0x7fcd3be00000 -Looks like both vga.vram and qxl.vram are being preserved (with the same -addresses), and no incompatible ram blocks are found during migration. -Sorry, addressed are not the same, of course. However corresponding ram -blocks do seem to be preserved and initialized. -So far, I have not reproduced the guest driver failure. - -However, I have isolated places where new QEMU improperly writes to -the qxl memory regions prior to starting the guest, by mmap'ing them -readonly after cpr: - -  qemu_ram_alloc_internal() -    if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) -        ram_flags |= RAM_READONLY; -    new_block = qemu_ram_alloc_from_fd(...) - -I have attached a draft fix; try it and let me know. -My console window looks fine before and after cpr, using --vnc $hostip:0 -vga qxl - -- Steve -Regarding the reproduce: when I launch the buggy version with the same -options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, -my VNC client silently hangs on the target after a while. Could it -happen on your stand as well? -cpr does not preserve the vnc connection and session. To test, I specify -port 0 for the source VM and port 1 for the dest. When the src vnc goes -dormant the dest vnc becomes active. -Could you try launching VM with -"-nographic -device qxl-vga"? That way VM's serial console is given you -directly in the shell, so when qxl driver crashes you're still able to -inspect the kernel messages. -I have been running like that, but have not reproduced the qxl driver crash, -and I suspect my guest image+kernel is too old. However, once I realized the -issue was post-cpr modification of qxl memory, I switched my attention to the -fix. -As for your patch, I can report that it doesn't resolve the issue as it -is. But I was able to track down another possible memory corruption -using your approach with readonly mmap'ing: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -412 d->ram->magic = cpu_to_le32(QXL_RAM_MAGIC); -[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] -(gdb) bt -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, -errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 -#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, -errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 -#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, -errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 -#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, value=true, -errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 -#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, v=0x5638996f3770, -name=0x56389759b141 "realized", opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) - at ../qom/object.c:2374 -#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, name=0x56389759b141 -"realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) - at ../qom/object.c:1449 -#7 0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70, -name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0) - at ../qom/qom-qobject.c:28 -#8 0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70, -name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0) - at ../qom/object.c:1519 -#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, bus=0x563898cf3c20, -errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 -#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50, -from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714 -#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, errp=0x56389855dc40 -<error_fatal>) at ../system/qdev-monitor.c:733 -#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150, -errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207 -#13 0x000056389737a6cc in qemu_opts_foreach - (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca <device_init_func>, -opaque=0x0, errp=0x56389855dc40 <error_fatal>) - at ../util/qemu-option.c:1135 -#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745 -#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 -<error_fatal>) at ../system/vl.c:2806 -#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at -../system/vl.c:3838 -#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at -../system/main.c:72 -So the attached adjusted version of your patch does seem to help. At -least I can't reproduce the crash on my stand. -Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram are -definitely harmful. Try V2 of the patch, attached, which skips the lines -of init_qxl_ram that modify guest memory. -I'm wondering, could it be useful to explicitly mark all the reused -memory regions readonly upon cpr-transfer, and then make them writable -back again after the migration is done? That way we will be segfaulting -early on instead of debugging tricky memory corruptions. -It's a useful debugging technique, but changing protection on a large memory -region -can be too expensive for production due to TLB shootdowns. - -Also, there are cases where writes are performed but the value is guaranteed to -be the same: - qxl_post_load() - qxl_set_mode() - d->rom->mode = cpu_to_le32(modenr); -The value is the same because mode and shadow_rom.mode were passed in vmstate -from old qemu. - -- Steve -0001-hw-qxl-cpr-support-preliminary-V2.patch -Description: -Text document - -On 3/5/25 22:19, Steven Sistare wrote: -On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: -On 3/4/25 9:05 PM, Steven Sistare wrote: -On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -On 2/28/25 8:20 PM, Steven Sistare wrote: -On 2/28/2025 1:13 PM, Steven Sistare wrote: -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently -and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ -      -machine q35 \ -      -cpu host -smp 2 -m 2G \ -      -object -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -ram0,share=on\ -      -machine memory-backend=ram0 \ -      -machine aux-ram-share=on \ -      -drive file=$ROOTFS,media=disk,if=virtio \ -      -qmp unix:$QMPSOCK,server=on,wait=off \ -      -nographic \ -      -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ -      -machine q35 \ -      -cpu host -smp 2 -m 2G \ -      -object -memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ -ram0,share=on\ -      -machine memory-backend=ram0 \ -      -machine aux-ram-share=on \ -      -drive file=$ROOTFS,media=disk,if=virtio \ -      -qmp unix:$QMPSOCK,server=on,wait=off \ -      -nographic \ -      -device qxl-vga \ -      -incoming tcp:0:44444 \ -      -incoming '{"channel-type": "cpr", "addr": { "transport": -"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF -      migrate-set-parameters mode=cpr-transfer -      migrate channels=[{"channel-type":"main","addr": -{"transport":"socket","type":"inet","host":"0","port":"44444"}}, -{"channel-type":"cpr","addr": -{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing -the -following messages: -[  73.962002] [TTM] Buffer eviction failed -[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -0x00000001) -[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* -failed to -allocate VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1- -min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which -speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do -          echo "$(date) starting round $j" -          if [ "$(journalctl --boot | grep "failed to -allocate VRAM -BO")" != "" ]; then -                  echo "bug was reproduced after $j tries" -                  exit 1 -          fi -          for i in $(seq 100); do -                  dmesg > /dev/tty3 -          done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce -that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the -crash - -without the cpr-transfer migration the above reproduce doesn't -lead to -crash on the source VM. -I suspect that, as cpr-transfer doesn't migrate the guest -memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. -Could somebody help the investigation and take a look into -this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' -Also try adding this patch to see if it flags any ram blocks as not -compatible with cpr. A message is printed at migration start time. -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- -email- -steven.sistare@oracle.com/ - -- Steve -With the traces enabled + the "migration: ram block cpr blockers" -patch -applied: - -Source: -cpr_find_fd pc.bios, id 0 returns -1 -cpr_save_fd pc.bios, id 0, fd 22 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -0x7fec18e00000 -cpr_find_fd pc.rom, id 0 returns -1 -cpr_save_fd pc.rom, id 0, fd 23 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -0x7fec18c00000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 24 host 0x7fec18a00000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 25 host 0x7feb77e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 27 host 0x7fec18800000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 28 host 0x7feb73c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 34 host 0x7fec18600000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 35 host 0x7fec18200000 -cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 36 host 0x7feb8b600000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -37 host 0x7feb8b400000 - -cpr_state_save cpr-transfer mode -cpr_transfer_output /var/run/alma8cpr-dst.sock -Target: -cpr_transfer_input /var/run/alma8cpr-dst.sock -cpr_state_load cpr-transfer mode -cpr_find_fd pc.bios, id 0 returns 20 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -0x7fcdc9800000 -cpr_find_fd pc.rom, id 0 returns 19 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -0x7fcdc9600000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 18 host 0x7fcdc9400000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 17 host 0x7fcd27e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 16 host 0x7fcdc9200000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 15 host 0x7fcd23c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 14 host 0x7fcdc8800000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 13 host 0x7fcdc8400000 -cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 11 host 0x7fcdc8200000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -10 host 0x7fcd3be00000 -Looks like both vga.vram and qxl.vram are being preserved (with -the same -addresses), and no incompatible ram blocks are found during -migration. -Sorry, addressed are not the same, of course. However -corresponding ram -blocks do seem to be preserved and initialized. -So far, I have not reproduced the guest driver failure. - -However, I have isolated places where new QEMU improperly writes to -the qxl memory regions prior to starting the guest, by mmap'ing them -readonly after cpr: - -  qemu_ram_alloc_internal() -    if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) -        ram_flags |= RAM_READONLY; -    new_block = qemu_ram_alloc_from_fd(...) - -I have attached a draft fix; try it and let me know. -My console window looks fine before and after cpr, using --vnc $hostip:0 -vga qxl - -- Steve -Regarding the reproduce: when I launch the buggy version with the same -options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, -my VNC client silently hangs on the target after a while. Could it -happen on your stand as well? -cpr does not preserve the vnc connection and session. To test, I specify -port 0 for the source VM and port 1 for the dest. When the src vnc goes -dormant the dest vnc becomes active. -Could you try launching VM with -"-nographic -device qxl-vga"? That way VM's serial console is given you -directly in the shell, so when qxl driver crashes you're still able to -inspect the kernel messages. -I have been running like that, but have not reproduced the qxl driver -crash, -and I suspect my guest image+kernel is too old. However, once I -realized the -issue was post-cpr modification of qxl memory, I switched my attention -to the -fix. -As for your patch, I can report that it doesn't resolve the issue as it -is. But I was able to track down another possible memory corruption -using your approach with readonly mmap'ing: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); -[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] -(gdb) bt -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, -errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 -#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, -errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 -#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, -errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 -#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, -value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 -#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, -v=0x5638996f3770, name=0x56389759b141 "realized", -opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) -    at ../qom/object.c:2374 -#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, -name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) -    at ../qom/object.c:1449 -#7 0x00005638970f8586 in object_property_set_qobject -(obj=0x5638996e0e70, name=0x56389759b141 "realized", -value=0x5638996df900, errp=0x7ffd3c2b84e0) -    at ../qom/qom-qobject.c:28 -#8 0x00005638970f3d8d in object_property_set_bool -(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, -errp=0x7ffd3c2b84e0) -    at ../qom/object.c:1519 -#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, -bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 -#10 0x0000563896dba675 in qdev_device_add_from_qdict -(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at -../system/qdev-monitor.c:714 -#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, -errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 -#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, -opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at -../system/vl.c:1207 -#13 0x000056389737a6cc in qemu_opts_foreach -    (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca -<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) -    at ../util/qemu-option.c:1135 -#14 0x0000563896dc89b5 in qemu_create_cli_devices () at -../system/vl.c:2745 -#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 -<error_fatal>) at ../system/vl.c:2806 -#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) -at ../system/vl.c:3838 -#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at -../system/main.c:72 -So the attached adjusted version of your patch does seem to help. At -least I can't reproduce the crash on my stand. -Thanks for the stack trace; the calls to SPICE_RING_INIT in -init_qxl_ram are -definitely harmful. Try V2 of the patch, attached, which skips the lines -of init_qxl_ram that modify guest memory. -I'm wondering, could it be useful to explicitly mark all the reused -memory regions readonly upon cpr-transfer, and then make them writable -back again after the migration is done? That way we will be segfaulting -early on instead of debugging tricky memory corruptions. -It's a useful debugging technique, but changing protection on a large -memory region -can be too expensive for production due to TLB shootdowns. -Good point. Though we could move this code under non-default option to -avoid re-writing. - -Den - -On 3/5/25 11:19 PM, Steven Sistare wrote: -> -On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: -> -> On 3/4/25 9:05 PM, Steven Sistare wrote: -> ->> On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -> ->>> On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -> ->>>> On 2/28/25 8:20 PM, Steven Sistare wrote: -> ->>>>> On 2/28/2025 1:13 PM, Steven Sistare wrote: -> ->>>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -> ->>>>>>> Hi all, -> ->>>>>>> -> ->>>>>>> We've been experimenting with cpr-transfer migration mode recently -> ->>>>>>> and -> ->>>>>>> have discovered the following issue with the guest QXL driver: -> ->>>>>>> -> ->>>>>>> Run migration source: -> ->>>>>>>> EMULATOR=/path/to/emulator -> ->>>>>>>> ROOTFS=/path/to/image -> ->>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>>>>>>> -> ->>>>>>>> $EMULATOR -enable-kvm \ -> ->>>>>>>>       -machine q35 \ -> ->>>>>>>>       -cpu host -smp 2 -m 2G \ -> ->>>>>>>>       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -> ->>>>>>>> dev/shm/ -> ->>>>>>>> ram0,share=on\ -> ->>>>>>>>       -machine memory-backend=ram0 \ -> ->>>>>>>>       -machine aux-ram-share=on \ -> ->>>>>>>>       -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>>>>>>       -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>>>>>>       -nographic \ -> ->>>>>>>>       -device qxl-vga -> ->>>>>>> -> ->>>>>>> Run migration target: -> ->>>>>>>> EMULATOR=/path/to/emulator -> ->>>>>>>> ROOTFS=/path/to/image -> ->>>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock -> ->>>>>>>> $EMULATOR -enable-kvm \ -> ->>>>>>>>       -machine q35 \ -> ->>>>>>>>       -cpu host -smp 2 -m 2G \ -> ->>>>>>>>       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -> ->>>>>>>> dev/shm/ -> ->>>>>>>> ram0,share=on\ -> ->>>>>>>>       -machine memory-backend=ram0 \ -> ->>>>>>>>       -machine aux-ram-share=on \ -> ->>>>>>>>       -drive file=$ROOTFS,media=disk,if=virtio \ -> ->>>>>>>>       -qmp unix:$QMPSOCK,server=on,wait=off \ -> ->>>>>>>>       -nographic \ -> ->>>>>>>>       -device qxl-vga \ -> ->>>>>>>>       -incoming tcp:0:44444 \ -> ->>>>>>>>       -incoming '{"channel-type": "cpr", "addr": { "transport": -> ->>>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -> ->>>>>>> -> ->>>>>>> -> ->>>>>>> Launch the migration: -> ->>>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -> ->>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock -> ->>>>>>>> -> ->>>>>>>> $QMPSHELL -p $QMPSOCK <<EOF -> ->>>>>>>>       migrate-set-parameters mode=cpr-transfer -> ->>>>>>>>       migrate channels=[{"channel-type":"main","addr": -> ->>>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, -> ->>>>>>>> {"channel-type":"cpr","addr": -> ->>>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -> ->>>>>>>> dst.sock"}}] -> ->>>>>>>> EOF -> ->>>>>>> -> ->>>>>>> Then, after a while, QXL guest driver on target crashes spewing the -> ->>>>>>> following messages: -> ->>>>>>>> [  73.962002] [TTM] Buffer eviction failed -> ->>>>>>>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -> ->>>>>>>> 0x00000001) -> ->>>>>>>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -> ->>>>>>>> allocate VRAM BO -> ->>>>>>> -> ->>>>>>> That seems to be a known kernel QXL driver bug: -> ->>>>>>> -> ->>>>>>> -https://lore.kernel.org/all/20220907094423.93581-1- -> ->>>>>>> min_halo@163.com/T/ -> ->>>>>>> -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -> ->>>>>>> -> ->>>>>>> (the latter discussion contains that reproduce script which -> ->>>>>>> speeds up -> ->>>>>>> the crash in the guest): -> ->>>>>>>> #!/bin/bash -> ->>>>>>>> -> ->>>>>>>> chvt 3 -> ->>>>>>>> -> ->>>>>>>> for j in $(seq 80); do -> ->>>>>>>>           echo "$(date) starting round $j" -> ->>>>>>>>           if [ "$(journalctl --boot | grep "failed to allocate -> ->>>>>>>> VRAM -> ->>>>>>>> BO")" != "" ]; then -> ->>>>>>>>                   echo "bug was reproduced after $j tries" -> ->>>>>>>>                   exit 1 -> ->>>>>>>>           fi -> ->>>>>>>>           for i in $(seq 100); do -> ->>>>>>>>                   dmesg > /dev/tty3 -> ->>>>>>>>           done -> ->>>>>>>> done -> ->>>>>>>> -> ->>>>>>>> echo "bug could not be reproduced" -> ->>>>>>>> exit 0 -> ->>>>>>> -> ->>>>>>> The bug itself seems to remain unfixed, as I was able to reproduce -> ->>>>>>> that -> ->>>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -> ->>>>>>> cpr-transfer code also seems to be buggy as it triggers the crash - -> ->>>>>>> without the cpr-transfer migration the above reproduce doesn't -> ->>>>>>> lead to -> ->>>>>>> crash on the source VM. -> ->>>>>>> -> ->>>>>>> I suspect that, as cpr-transfer doesn't migrate the guest -> ->>>>>>> memory, but -> ->>>>>>> rather passes it through the memory backend object, our code might -> ->>>>>>> somehow corrupt the VRAM. However, I wasn't able to trace the -> ->>>>>>> corruption so far. -> ->>>>>>> -> ->>>>>>> Could somebody help the investigation and take a look into -> ->>>>>>> this? Any -> ->>>>>>> suggestions would be appreciated. Thanks! -> ->>>>>> -> ->>>>>> Possibly some memory region created by qxl is not being preserved. -> ->>>>>> Try adding these traces to see what is preserved: -> ->>>>>> -> ->>>>>> -trace enable='*cpr*' -> ->>>>>> -trace enable='*ram_alloc*' -> ->>>>> -> ->>>>> Also try adding this patch to see if it flags any ram blocks as not -> ->>>>> compatible with cpr. A message is printed at migration start time. -> ->>>>>    -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- -> ->>>>> email- -> ->>>>> steven.sistare@oracle.com/ -> ->>>>> -> ->>>>> - Steve -> ->>>>> -> ->>>> -> ->>>> With the traces enabled + the "migration: ram block cpr blockers" -> ->>>> patch -> ->>>> applied: -> ->>>> -> ->>>> Source: -> ->>>>> cpr_find_fd pc.bios, id 0 returns -1 -> ->>>>> cpr_save_fd pc.bios, id 0, fd 22 -> ->>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -> ->>>>> 0x7fec18e00000 -> ->>>>> cpr_find_fd pc.rom, id 0 returns -1 -> ->>>>> cpr_save_fd pc.rom, id 0, fd 23 -> ->>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -> ->>>>> 0x7fec18c00000 -> ->>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -> ->>>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -> ->>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -> ->>>>> 262144 fd 24 host 0x7fec18a00000 -> ->>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -> ->>>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -> ->>>>> 67108864 fd 25 host 0x7feb77e00000 -> ->>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -> ->>>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -> ->>>>> fd 27 host 0x7fec18800000 -> ->>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -> ->>>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -> ->>>>> 67108864 fd 28 host 0x7feb73c00000 -> ->>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -> ->>>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -> ->>>>> fd 34 host 0x7fec18600000 -> ->>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -> ->>>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -> ->>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -> ->>>>> 2097152 fd 35 host 0x7fec18200000 -> ->>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -> ->>>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -> ->>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -> ->>>>> fd 36 host 0x7feb8b600000 -> ->>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -> ->>>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -> ->>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -> ->>>>> 37 host 0x7feb8b400000 -> ->>>>> -> ->>>>> cpr_state_save cpr-transfer mode -> ->>>>> cpr_transfer_output /var/run/alma8cpr-dst.sock -> ->>>> -> ->>>> Target: -> ->>>>> cpr_transfer_input /var/run/alma8cpr-dst.sock -> ->>>>> cpr_state_load cpr-transfer mode -> ->>>>> cpr_find_fd pc.bios, id 0 returns 20 -> ->>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -> ->>>>> 0x7fcdc9800000 -> ->>>>> cpr_find_fd pc.rom, id 0 returns 19 -> ->>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -> ->>>>> 0x7fcdc9600000 -> ->>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -> ->>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -> ->>>>> 262144 fd 18 host 0x7fcdc9400000 -> ->>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -> ->>>>> 67108864 fd 17 host 0x7fcd27e00000 -> ->>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -> ->>>>> fd 16 host 0x7fcdc9200000 -> ->>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -> ->>>>> 67108864 fd 15 host 0x7fcd23c00000 -> ->>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -> ->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -> ->>>>> fd 14 host 0x7fcdc8800000 -> ->>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -> ->>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -> ->>>>> 2097152 fd 13 host 0x7fcdc8400000 -> ->>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -> ->>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -> ->>>>> fd 11 host 0x7fcdc8200000 -> ->>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -> ->>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -> ->>>>> 10 host 0x7fcd3be00000 -> ->>>> -> ->>>> Looks like both vga.vram and qxl.vram are being preserved (with the -> ->>>> same -> ->>>> addresses), and no incompatible ram blocks are found during migration. -> ->>> -> ->>> Sorry, addressed are not the same, of course. However corresponding -> ->>> ram -> ->>> blocks do seem to be preserved and initialized. -> ->> -> ->> So far, I have not reproduced the guest driver failure. -> ->> -> ->> However, I have isolated places where new QEMU improperly writes to -> ->> the qxl memory regions prior to starting the guest, by mmap'ing them -> ->> readonly after cpr: -> ->> -> ->>   qemu_ram_alloc_internal() -> ->>     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) -> ->>         ram_flags |= RAM_READONLY; -> ->>     new_block = qemu_ram_alloc_from_fd(...) -> ->> -> ->> I have attached a draft fix; try it and let me know. -> ->> My console window looks fine before and after cpr, using -> ->> -vnc $hostip:0 -vga qxl -> ->> -> ->> - Steve -> -> -> -> Regarding the reproduce: when I launch the buggy version with the same -> -> options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, -> -> my VNC client silently hangs on the target after a while. Could it -> -> happen on your stand as well? -> -> -cpr does not preserve the vnc connection and session. To test, I specify -> -port 0 for the source VM and port 1 for the dest. When the src vnc goes -> -dormant the dest vnc becomes active. -> -Sure, I meant that VNC on the dest (on the port 1) works for a while -after the migration and then hangs, apparently after the guest QXL crash. - -> -> Could you try launching VM with -> -> "-nographic -device qxl-vga"? That way VM's serial console is given you -> -> directly in the shell, so when qxl driver crashes you're still able to -> -> inspect the kernel messages. -> -> -I have been running like that, but have not reproduced the qxl driver -> -crash, -> -and I suspect my guest image+kernel is too old. -Yes, that's probably the case. But the crash occurs on my Fedora 41 -guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to -be buggy. - - -> -However, once I realized the -> -issue was post-cpr modification of qxl memory, I switched my attention -> -to the -> -fix. -> -> -> As for your patch, I can report that it doesn't resolve the issue as it -> -> is. But I was able to track down another possible memory corruption -> -> using your approach with readonly mmap'ing: -> -> -> ->> Program terminated with signal SIGSEGV, Segmentation fault. -> ->> #0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -> ->> 412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); -> ->> [Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] -> ->> (gdb) bt -> ->> #0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -> ->> #1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, -> ->> errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 -> ->> #2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, -> ->> errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 -> ->> #3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, -> ->> errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 -> ->> #4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, -> ->> value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 -> ->> #5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, -> ->> v=0x5638996f3770, name=0x56389759b141 "realized", -> ->> opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) -> ->>     at ../qom/object.c:2374 -> ->> #6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, -> ->> name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) -> ->>     at ../qom/object.c:1449 -> ->> #7 0x00005638970f8586 in object_property_set_qobject -> ->> (obj=0x5638996e0e70, name=0x56389759b141 "realized", -> ->> value=0x5638996df900, errp=0x7ffd3c2b84e0) -> ->>     at ../qom/qom-qobject.c:28 -> ->> #8 0x00005638970f3d8d in object_property_set_bool -> ->> (obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, -> ->> errp=0x7ffd3c2b84e0) -> ->>     at ../qom/object.c:1519 -> ->> #9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, -> ->> bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 -> ->> #10 0x0000563896dba675 in qdev_device_add_from_qdict -> ->> (opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ -> ->> system/qdev-monitor.c:714 -> ->> #11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, -> ->> errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 -> ->> #12 0x0000563896dc48f1 in device_init_func (opaque=0x0, -> ->> opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ -> ->> vl.c:1207 -> ->> #13 0x000056389737a6cc in qemu_opts_foreach -> ->>     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca -> ->> <device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) -> ->>     at ../util/qemu-option.c:1135 -> ->> #14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ -> ->> vl.c:2745 -> ->> #15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 -> ->> <error_fatal>) at ../system/vl.c:2806 -> ->> #16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) -> ->> at ../system/vl.c:3838 -> ->> #17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ -> ->> system/main.c:72 -> -> -> -> So the attached adjusted version of your patch does seem to help. At -> -> least I can't reproduce the crash on my stand. -> -> -Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram -> -are -> -definitely harmful. Try V2 of the patch, attached, which skips the lines -> -of init_qxl_ram that modify guest memory. -> -Thanks, your v2 patch does seem to prevent the crash. Would you re-send -it to the list as a proper fix? - -> -> I'm wondering, could it be useful to explicitly mark all the reused -> -> memory regions readonly upon cpr-transfer, and then make them writable -> -> back again after the migration is done? That way we will be segfaulting -> -> early on instead of debugging tricky memory corruptions. -> -> -It's a useful debugging technique, but changing protection on a large -> -memory region -> -can be too expensive for production due to TLB shootdowns. -> -> -Also, there are cases where writes are performed but the value is -> -guaranteed to -> -be the same: -> - qxl_post_load() -> -   qxl_set_mode() -> -     d->rom->mode = cpu_to_le32(modenr); -> -The value is the same because mode and shadow_rom.mode were passed in -> -vmstate -> -from old qemu. -> -There're also cases where devices' ROM might be re-initialized. E.g. -this segfault occures upon further exploration of RO mapped RAM blocks: - -> -Program terminated with signal SIGSEGV, Segmentation fault. -> -#0 __memmove_avx_unaligned_erms () at -> -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -> -664 rep movsb -> -[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] -> -(gdb) bt -> -#0 __memmove_avx_unaligned_erms () at -> -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -> -#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, -> -owner=0x55aa2019ac10, name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) -> -at ../hw/core/loader.c:1032 -> -#2 0x000055aa1d031577 in rom_add_blob -> -(name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, -> -max_len=2097152, addr=18446744073709551615, fw_file_name=0x55aa1da51f13 -> -"etc/acpi/tables", fw_callback=0x55aa1d441f59 <acpi_build_update>, -> -callback_opaque=0x55aa20ff0010, as=0x0, read_only=true) at -> -../hw/core/loader.c:1147 -> -#3 0x000055aa1cfd788d in acpi_add_rom_blob -> -(update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, -> -blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at -> -../hw/acpi/utils.c:46 -> -#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 -> -#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) -> -at ../hw/i386/pc.c:638 -> -#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 -> -<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 -> -#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at -> -../hw/core/machine.c:1749 -> -#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 -> -<error_fatal>) at ../system/vl.c:2779 -> -#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 -> -<error_fatal>) at ../system/vl.c:2807 -> -#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at -> -../system/vl.c:3838 -> -#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at -> -../system/main.c:72 -I'm not sure whether ACPI tables ROM in particular is rewritten with the -same content, but there might be cases where ROM can be read from file -system upon initialization. That is undesirable as guest kernel -certainly won't be too happy about sudden change of the device's ROM -content. - -So the issue we're dealing with here is any unwanted memory related -device initialization upon cpr. - -For now the only thing that comes to my mind is to make a test where we -put as many devices as we can into a VM, make ram blocks RO upon cpr -(and remap them as RW later after migration is done, if needed), and -catch any unwanted memory violations. As Den suggested, we might -consider adding that behaviour as a separate non-default option (or -"migrate" command flag specific to cpr-transfer), which would only be -used in the testing. - -Andrey - -On 3/6/25 16:16, Andrey Drobyshev wrote: -On 3/5/25 11:19 PM, Steven Sistare wrote: -On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: -On 3/4/25 9:05 PM, Steven Sistare wrote: -On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -On 2/28/25 8:20 PM, Steven Sistare wrote: -On 2/28/2025 1:13 PM, Steven Sistare wrote: -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently -and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ -       -machine q35 \ -       -cpu host -smp 2 -m 2G \ -       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -dev/shm/ -ram0,share=on\ -       -machine memory-backend=ram0 \ -       -machine aux-ram-share=on \ -       -drive file=$ROOTFS,media=disk,if=virtio \ -       -qmp unix:$QMPSOCK,server=on,wait=off \ -       -nographic \ -       -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ -       -machine q35 \ -       -cpu host -smp 2 -m 2G \ -       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -dev/shm/ -ram0,share=on\ -       -machine memory-backend=ram0 \ -       -machine aux-ram-share=on \ -       -drive file=$ROOTFS,media=disk,if=virtio \ -       -qmp unix:$QMPSOCK,server=on,wait=off \ -       -nographic \ -       -device qxl-vga \ -       -incoming tcp:0:44444 \ -       -incoming '{"channel-type": "cpr", "addr": { "transport": -"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF -       migrate-set-parameters mode=cpr-transfer -       migrate channels=[{"channel-type":"main","addr": -{"transport":"socket","type":"inet","host":"0","port":"44444"}}, -{"channel-type":"cpr","addr": -{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -[  73.962002] [TTM] Buffer eviction failed -[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -0x00000001) -[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -allocate VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1- -min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which -speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do -           echo "$(date) starting round $j" -           if [ "$(journalctl --boot | grep "failed to allocate -VRAM -BO")" != "" ]; then -                   echo "bug was reproduced after $j tries" -                   exit 1 -           fi -           for i in $(seq 100); do -                   dmesg > /dev/tty3 -           done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce -that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't -lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest -memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into -this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' -Also try adding this patch to see if it flags any ram blocks as not -compatible with cpr. A message is printed at migration start time. -    -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- -email- -steven.sistare@oracle.com/ - -- Steve -With the traces enabled + the "migration: ram block cpr blockers" -patch -applied: - -Source: -cpr_find_fd pc.bios, id 0 returns -1 -cpr_save_fd pc.bios, id 0, fd 22 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -0x7fec18e00000 -cpr_find_fd pc.rom, id 0 returns -1 -cpr_save_fd pc.rom, id 0, fd 23 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -0x7fec18c00000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 24 host 0x7fec18a00000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 25 host 0x7feb77e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 27 host 0x7fec18800000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 28 host 0x7feb73c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 34 host 0x7fec18600000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 35 host 0x7fec18200000 -cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 36 host 0x7feb8b600000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -37 host 0x7feb8b400000 - -cpr_state_save cpr-transfer mode -cpr_transfer_output /var/run/alma8cpr-dst.sock -Target: -cpr_transfer_input /var/run/alma8cpr-dst.sock -cpr_state_load cpr-transfer mode -cpr_find_fd pc.bios, id 0 returns 20 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -0x7fcdc9800000 -cpr_find_fd pc.rom, id 0 returns 19 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -0x7fcdc9600000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 18 host 0x7fcdc9400000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 17 host 0x7fcd27e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 16 host 0x7fcdc9200000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 15 host 0x7fcd23c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 14 host 0x7fcdc8800000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 13 host 0x7fcdc8400000 -cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 11 host 0x7fcdc8200000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -10 host 0x7fcd3be00000 -Looks like both vga.vram and qxl.vram are being preserved (with the -same -addresses), and no incompatible ram blocks are found during migration. -Sorry, addressed are not the same, of course. However corresponding -ram -blocks do seem to be preserved and initialized. -So far, I have not reproduced the guest driver failure. - -However, I have isolated places where new QEMU improperly writes to -the qxl memory regions prior to starting the guest, by mmap'ing them -readonly after cpr: - -   qemu_ram_alloc_internal() -     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) -         ram_flags |= RAM_READONLY; -     new_block = qemu_ram_alloc_from_fd(...) - -I have attached a draft fix; try it and let me know. -My console window looks fine before and after cpr, using --vnc $hostip:0 -vga qxl - -- Steve -Regarding the reproduce: when I launch the buggy version with the same -options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, -my VNC client silently hangs on the target after a while. Could it -happen on your stand as well? -cpr does not preserve the vnc connection and session. To test, I specify -port 0 for the source VM and port 1 for the dest. When the src vnc goes -dormant the dest vnc becomes active. -Sure, I meant that VNC on the dest (on the port 1) works for a while -after the migration and then hangs, apparently after the guest QXL crash. -Could you try launching VM with -"-nographic -device qxl-vga"? That way VM's serial console is given you -directly in the shell, so when qxl driver crashes you're still able to -inspect the kernel messages. -I have been running like that, but have not reproduced the qxl driver -crash, -and I suspect my guest image+kernel is too old. -Yes, that's probably the case. But the crash occurs on my Fedora 41 -guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to -be buggy. -However, once I realized the -issue was post-cpr modification of qxl memory, I switched my attention -to the -fix. -As for your patch, I can report that it doesn't resolve the issue as it -is. But I was able to track down another possible memory corruption -using your approach with readonly mmap'ing: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); -[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] -(gdb) bt -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, -errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 -#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, -errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 -#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, -errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 -#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, -value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 -#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, -v=0x5638996f3770, name=0x56389759b141 "realized", -opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) -     at ../qom/object.c:2374 -#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, -name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) -     at ../qom/object.c:1449 -#7 0x00005638970f8586 in object_property_set_qobject -(obj=0x5638996e0e70, name=0x56389759b141 "realized", -value=0x5638996df900, errp=0x7ffd3c2b84e0) -     at ../qom/qom-qobject.c:28 -#8 0x00005638970f3d8d in object_property_set_bool -(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, -errp=0x7ffd3c2b84e0) -     at ../qom/object.c:1519 -#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, -bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 -#10 0x0000563896dba675 in qdev_device_add_from_qdict -(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ -system/qdev-monitor.c:714 -#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, -errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 -#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, -opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ -vl.c:1207 -#13 0x000056389737a6cc in qemu_opts_foreach -     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca -<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) -     at ../util/qemu-option.c:1135 -#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ -vl.c:2745 -#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 -<error_fatal>) at ../system/vl.c:2806 -#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) -at ../system/vl.c:3838 -#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ -system/main.c:72 -So the attached adjusted version of your patch does seem to help. At -least I can't reproduce the crash on my stand. -Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram -are -definitely harmful. Try V2 of the patch, attached, which skips the lines -of init_qxl_ram that modify guest memory. -Thanks, your v2 patch does seem to prevent the crash. Would you re-send -it to the list as a proper fix? -I'm wondering, could it be useful to explicitly mark all the reused -memory regions readonly upon cpr-transfer, and then make them writable -back again after the migration is done? That way we will be segfaulting -early on instead of debugging tricky memory corruptions. -It's a useful debugging technique, but changing protection on a large -memory region -can be too expensive for production due to TLB shootdowns. - -Also, there are cases where writes are performed but the value is -guaranteed to -be the same: -  qxl_post_load() -    qxl_set_mode() -      d->rom->mode = cpu_to_le32(modenr); -The value is the same because mode and shadow_rom.mode were passed in -vmstate -from old qemu. -There're also cases where devices' ROM might be re-initialized. E.g. -this segfault occures upon further exploration of RO mapped RAM blocks: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 __memmove_avx_unaligned_erms () at -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -664 rep movsb -[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] -(gdb) bt -#0 __memmove_avx_unaligned_erms () at -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, -name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) - at ../hw/core/loader.c:1032 -#2 0x000055aa1d031577 in rom_add_blob - (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, -addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", -fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, -read_only=true) at ../hw/core/loader.c:1147 -#3 0x000055aa1cfd788d in acpi_add_rom_blob - (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, -blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46 -#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 -#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) -at ../hw/i386/pc.c:638 -#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 -<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 -#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at -../hw/core/machine.c:1749 -#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 -<error_fatal>) at ../system/vl.c:2779 -#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 -<error_fatal>) at ../system/vl.c:2807 -#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at -../system/vl.c:3838 -#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at -../system/main.c:72 -I'm not sure whether ACPI tables ROM in particular is rewritten with the -same content, but there might be cases where ROM can be read from file -system upon initialization. That is undesirable as guest kernel -certainly won't be too happy about sudden change of the device's ROM -content. - -So the issue we're dealing with here is any unwanted memory related -device initialization upon cpr. - -For now the only thing that comes to my mind is to make a test where we -put as many devices as we can into a VM, make ram blocks RO upon cpr -(and remap them as RW later after migration is done, if needed), and -catch any unwanted memory violations. As Den suggested, we might -consider adding that behaviour as a separate non-default option (or -"migrate" command flag specific to cpr-transfer), which would only be -used in the testing. - -Andrey -No way. ACPI with the source must be used in the same way as BIOSes -and optional ROMs. - -Den - -On 3/6/2025 10:52 AM, Denis V. Lunev wrote: -On 3/6/25 16:16, Andrey Drobyshev wrote: -On 3/5/25 11:19 PM, Steven Sistare wrote: -On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: -On 3/4/25 9:05 PM, Steven Sistare wrote: -On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -On 2/28/25 8:20 PM, Steven Sistare wrote: -On 2/28/2025 1:13 PM, Steven Sistare wrote: -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently -and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ -       -machine q35 \ -       -cpu host -smp 2 -m 2G \ -       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -dev/shm/ -ram0,share=on\ -       -machine memory-backend=ram0 \ -       -machine aux-ram-share=on \ -       -drive file=$ROOTFS,media=disk,if=virtio \ -       -qmp unix:$QMPSOCK,server=on,wait=off \ -       -nographic \ -       -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ -       -machine q35 \ -       -cpu host -smp 2 -m 2G \ -       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -dev/shm/ -ram0,share=on\ -       -machine memory-backend=ram0 \ -       -machine aux-ram-share=on \ -       -drive file=$ROOTFS,media=disk,if=virtio \ -       -qmp unix:$QMPSOCK,server=on,wait=off \ -       -nographic \ -       -device qxl-vga \ -       -incoming tcp:0:44444 \ -       -incoming '{"channel-type": "cpr", "addr": { "transport": -"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF -       migrate-set-parameters mode=cpr-transfer -       migrate channels=[{"channel-type":"main","addr": -{"transport":"socket","type":"inet","host":"0","port":"44444"}}, -{"channel-type":"cpr","addr": -{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -[  73.962002] [TTM] Buffer eviction failed -[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -0x00000001) -[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -allocate VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1- -min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which -speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do -           echo "$(date) starting round $j" -           if [ "$(journalctl --boot | grep "failed to allocate -VRAM -BO")" != "" ]; then -                   echo "bug was reproduced after $j tries" -                   exit 1 -           fi -           for i in $(seq 100); do -                   dmesg > /dev/tty3 -           done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce -that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't -lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest -memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into -this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' -Also try adding this patch to see if it flags any ram blocks as not -compatible with cpr. A message is printed at migration start time. -    -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- -email- -steven.sistare@oracle.com/ - -- Steve -With the traces enabled + the "migration: ram block cpr blockers" -patch -applied: - -Source: -cpr_find_fd pc.bios, id 0 returns -1 -cpr_save_fd pc.bios, id 0, fd 22 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -0x7fec18e00000 -cpr_find_fd pc.rom, id 0 returns -1 -cpr_save_fd pc.rom, id 0, fd 23 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -0x7fec18c00000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 24 host 0x7fec18a00000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 25 host 0x7feb77e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 27 host 0x7fec18800000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 28 host 0x7feb73c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 34 host 0x7fec18600000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 35 host 0x7fec18200000 -cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 36 host 0x7feb8b600000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -37 host 0x7feb8b400000 - -cpr_state_save cpr-transfer mode -cpr_transfer_output /var/run/alma8cpr-dst.sock -Target: -cpr_transfer_input /var/run/alma8cpr-dst.sock -cpr_state_load cpr-transfer mode -cpr_find_fd pc.bios, id 0 returns 20 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -0x7fcdc9800000 -cpr_find_fd pc.rom, id 0 returns 19 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -0x7fcdc9600000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 18 host 0x7fcdc9400000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 17 host 0x7fcd27e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 16 host 0x7fcdc9200000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 15 host 0x7fcd23c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 14 host 0x7fcdc8800000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 13 host 0x7fcdc8400000 -cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 11 host 0x7fcdc8200000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -10 host 0x7fcd3be00000 -Looks like both vga.vram and qxl.vram are being preserved (with the -same -addresses), and no incompatible ram blocks are found during migration. -Sorry, addressed are not the same, of course. However corresponding -ram -blocks do seem to be preserved and initialized. -So far, I have not reproduced the guest driver failure. - -However, I have isolated places where new QEMU improperly writes to -the qxl memory regions prior to starting the guest, by mmap'ing them -readonly after cpr: - -   qemu_ram_alloc_internal() -     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) -         ram_flags |= RAM_READONLY; -     new_block = qemu_ram_alloc_from_fd(...) - -I have attached a draft fix; try it and let me know. -My console window looks fine before and after cpr, using --vnc $hostip:0 -vga qxl - -- Steve -Regarding the reproduce: when I launch the buggy version with the same -options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, -my VNC client silently hangs on the target after a while. Could it -happen on your stand as well? -cpr does not preserve the vnc connection and session. To test, I specify -port 0 for the source VM and port 1 for the dest. When the src vnc goes -dormant the dest vnc becomes active. -Sure, I meant that VNC on the dest (on the port 1) works for a while -after the migration and then hangs, apparently after the guest QXL crash. -Could you try launching VM with -"-nographic -device qxl-vga"? That way VM's serial console is given you -directly in the shell, so when qxl driver crashes you're still able to -inspect the kernel messages. -I have been running like that, but have not reproduced the qxl driver -crash, -and I suspect my guest image+kernel is too old. -Yes, that's probably the case. But the crash occurs on my Fedora 41 -guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to -be buggy. -However, once I realized the -issue was post-cpr modification of qxl memory, I switched my attention -to the -fix. -As for your patch, I can report that it doesn't resolve the issue as it -is. But I was able to track down another possible memory corruption -using your approach with readonly mmap'ing: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); -[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] -(gdb) bt -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, -errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 -#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, -errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 -#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, -errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 -#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, -value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 -#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, -v=0x5638996f3770, name=0x56389759b141 "realized", -opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) -     at ../qom/object.c:2374 -#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, -name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) -     at ../qom/object.c:1449 -#7 0x00005638970f8586 in object_property_set_qobject -(obj=0x5638996e0e70, name=0x56389759b141 "realized", -value=0x5638996df900, errp=0x7ffd3c2b84e0) -     at ../qom/qom-qobject.c:28 -#8 0x00005638970f3d8d in object_property_set_bool -(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, -errp=0x7ffd3c2b84e0) -     at ../qom/object.c:1519 -#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, -bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 -#10 0x0000563896dba675 in qdev_device_add_from_qdict -(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ -system/qdev-monitor.c:714 -#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, -errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 -#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, -opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ -vl.c:1207 -#13 0x000056389737a6cc in qemu_opts_foreach -     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca -<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) -     at ../util/qemu-option.c:1135 -#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ -vl.c:2745 -#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 -<error_fatal>) at ../system/vl.c:2806 -#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) -at ../system/vl.c:3838 -#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ -system/main.c:72 -So the attached adjusted version of your patch does seem to help. At -least I can't reproduce the crash on my stand. -Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram -are -definitely harmful. Try V2 of the patch, attached, which skips the lines -of init_qxl_ram that modify guest memory. -Thanks, your v2 patch does seem to prevent the crash. Would you re-send -it to the list as a proper fix? -Yes. Was waiting for your confirmation. -I'm wondering, could it be useful to explicitly mark all the reused -memory regions readonly upon cpr-transfer, and then make them writable -back again after the migration is done? That way we will be segfaulting -early on instead of debugging tricky memory corruptions. -It's a useful debugging technique, but changing protection on a large -memory region -can be too expensive for production due to TLB shootdowns. - -Also, there are cases where writes are performed but the value is -guaranteed to -be the same: -  qxl_post_load() -    qxl_set_mode() -      d->rom->mode = cpu_to_le32(modenr); -The value is the same because mode and shadow_rom.mode were passed in -vmstate -from old qemu. -There're also cases where devices' ROM might be re-initialized. E.g. -this segfault occures upon further exploration of RO mapped RAM blocks: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 __memmove_avx_unaligned_erms () at -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -664            rep    movsb -[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] -(gdb) bt -#0 __memmove_avx_unaligned_erms () at -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, -name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) -    at ../hw/core/loader.c:1032 -#2 0x000055aa1d031577 in rom_add_blob -    (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, -addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", -fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, -read_only=true) at ../hw/core/loader.c:1147 -#3 0x000055aa1cfd788d in acpi_add_rom_blob -    (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, -blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46 -#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 -#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) -at ../hw/i386/pc.c:638 -#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 -<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 -#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at -../hw/core/machine.c:1749 -#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 -<error_fatal>) at ../system/vl.c:2779 -#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 -<error_fatal>) at ../system/vl.c:2807 -#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at -../system/vl.c:3838 -#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at -../system/main.c:72 -I'm not sure whether ACPI tables ROM in particular is rewritten with the -same content, but there might be cases where ROM can be read from file -system upon initialization. That is undesirable as guest kernel -certainly won't be too happy about sudden change of the device's ROM -content. - -So the issue we're dealing with here is any unwanted memory related -device initialization upon cpr. - -For now the only thing that comes to my mind is to make a test where we -put as many devices as we can into a VM, make ram blocks RO upon cpr -(and remap them as RW later after migration is done, if needed), and -catch any unwanted memory violations. As Den suggested, we might -consider adding that behaviour as a separate non-default option (or -"migrate" command flag specific to cpr-transfer), which would only be -used in the testing. -I'll look into adding an option, but there may be too many false positives, -such as the qxl_set_mode case above. And the maintainers may object to me -eliminating the false positives by adding more CPR_IN tests, due to gratuitous -(from their POV) ugliness. - -But I will use the technique to look for more write violations. -Andrey -No way. ACPI with the source must be used in the same way as BIOSes -and optional ROMs. -Yup, its a bug. Will fix. - -- Steve - -see -1741380954-341079-1-git-send-email-steven.sistare@oracle.com -/">https://lore.kernel.org/qemu-devel/ -1741380954-341079-1-git-send-email-steven.sistare@oracle.com -/ -- Steve - -On 3/6/2025 11:13 AM, Steven Sistare wrote: -On 3/6/2025 10:52 AM, Denis V. Lunev wrote: -On 3/6/25 16:16, Andrey Drobyshev wrote: -On 3/5/25 11:19 PM, Steven Sistare wrote: -On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: -On 3/4/25 9:05 PM, Steven Sistare wrote: -On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: -On 2/28/25 8:35 PM, Andrey Drobyshev wrote: -On 2/28/25 8:20 PM, Steven Sistare wrote: -On 2/28/2025 1:13 PM, Steven Sistare wrote: -On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: -Hi all, - -We've been experimenting with cpr-transfer migration mode recently -and -have discovered the following issue with the guest QXL driver: - -Run migration source: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-src.sock - -$EMULATOR -enable-kvm \ -       -machine q35 \ -       -cpu host -smp 2 -m 2G \ -       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -dev/shm/ -ram0,share=on\ -       -machine memory-backend=ram0 \ -       -machine aux-ram-share=on \ -       -drive file=$ROOTFS,media=disk,if=virtio \ -       -qmp unix:$QMPSOCK,server=on,wait=off \ -       -nographic \ -       -device qxl-vga -Run migration target: -EMULATOR=/path/to/emulator -ROOTFS=/path/to/image -QMPSOCK=/var/run/alma8qmp-dst.sock -$EMULATOR -enable-kvm \ -       -machine q35 \ -       -cpu host -smp 2 -m 2G \ -       -object memory-backend-file,id=ram0,size=2G,mem-path=/ -dev/shm/ -ram0,share=on\ -       -machine memory-backend=ram0 \ -       -machine aux-ram-share=on \ -       -drive file=$ROOTFS,media=disk,if=virtio \ -       -qmp unix:$QMPSOCK,server=on,wait=off \ -       -nographic \ -       -device qxl-vga \ -       -incoming tcp:0:44444 \ -       -incoming '{"channel-type": "cpr", "addr": { "transport": -"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' -Launch the migration: -QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell -QMPSOCK=/var/run/alma8qmp-src.sock - -$QMPSHELL -p $QMPSOCK <<EOF -       migrate-set-parameters mode=cpr-transfer -       migrate channels=[{"channel-type":"main","addr": -{"transport":"socket","type":"inet","host":"0","port":"44444"}}, -{"channel-type":"cpr","addr": -{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- -dst.sock"}}] -EOF -Then, after a while, QXL guest driver on target crashes spewing the -following messages: -[  73.962002] [TTM] Buffer eviction failed -[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, -0x00000001) -[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to -allocate VRAM BO -That seems to be a known kernel QXL driver bug: -https://lore.kernel.org/all/20220907094423.93581-1- -min_halo@163.com/T/ -https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ -(the latter discussion contains that reproduce script which -speeds up -the crash in the guest): -#!/bin/bash - -chvt 3 - -for j in $(seq 80); do -           echo "$(date) starting round $j" -           if [ "$(journalctl --boot | grep "failed to allocate -VRAM -BO")" != "" ]; then -                   echo "bug was reproduced after $j tries" -                   exit 1 -           fi -           for i in $(seq 100); do -                   dmesg > /dev/tty3 -           done -done - -echo "bug could not be reproduced" -exit 0 -The bug itself seems to remain unfixed, as I was able to reproduce -that -with Fedora 41 guest, as well as AlmaLinux 8 guest. However our -cpr-transfer code also seems to be buggy as it triggers the crash - -without the cpr-transfer migration the above reproduce doesn't -lead to -crash on the source VM. - -I suspect that, as cpr-transfer doesn't migrate the guest -memory, but -rather passes it through the memory backend object, our code might -somehow corrupt the VRAM. However, I wasn't able to trace the -corruption so far. - -Could somebody help the investigation and take a look into -this? Any -suggestions would be appreciated. Thanks! -Possibly some memory region created by qxl is not being preserved. -Try adding these traces to see what is preserved: - --trace enable='*cpr*' --trace enable='*ram_alloc*' -Also try adding this patch to see if it flags any ram blocks as not -compatible with cpr. A message is printed at migration start time. -    -https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- -email- -steven.sistare@oracle.com/ - -- Steve -With the traces enabled + the "migration: ram block cpr blockers" -patch -applied: - -Source: -cpr_find_fd pc.bios, id 0 returns -1 -cpr_save_fd pc.bios, id 0, fd 22 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host -0x7fec18e00000 -cpr_find_fd pc.rom, id 0 returns -1 -cpr_save_fd pc.rom, id 0, fd 23 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host -0x7fec18c00000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 -cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 24 host 0x7fec18a00000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 25 host 0x7feb77e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 27 host 0x7fec18800000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 28 host 0x7feb73c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 -cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 34 host 0x7fec18600000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 35 host 0x7fec18200000 -cpr_find_fd /rom@etc/table-loader, id 0 returns -1 -cpr_save_fd /rom@etc/table-loader, id 0, fd 36 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 36 host 0x7feb8b600000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 -cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -37 host 0x7feb8b400000 - -cpr_state_save cpr-transfer mode -cpr_transfer_output /var/run/alma8cpr-dst.sock -Target: -cpr_transfer_input /var/run/alma8cpr-dst.sock -cpr_state_load cpr-transfer mode -cpr_find_fd pc.bios, id 0 returns 20 -qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host -0x7fcdc9800000 -cpr_find_fd pc.rom, id 0 returns 19 -qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host -0x7fcdc9600000 -cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 -qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size -262144 fd 18 host 0x7fcdc9400000 -cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 -qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size -67108864 fd 17 host 0x7fcd27e00000 -cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 -fd 16 host 0x7fcdc9200000 -cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 -qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size -67108864 fd 15 host 0x7fcd23c00000 -cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 -qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 -fd 14 host 0x7fcdc8800000 -cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 -qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size -2097152 fd 13 host 0x7fcdc8400000 -cpr_find_fd /rom@etc/table-loader, id 0 returns 11 -qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 -fd 11 host 0x7fcdc8200000 -cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 -qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd -10 host 0x7fcd3be00000 -Looks like both vga.vram and qxl.vram are being preserved (with the -same -addresses), and no incompatible ram blocks are found during migration. -Sorry, addressed are not the same, of course. However corresponding -ram -blocks do seem to be preserved and initialized. -So far, I have not reproduced the guest driver failure. - -However, I have isolated places where new QEMU improperly writes to -the qxl memory regions prior to starting the guest, by mmap'ing them -readonly after cpr: - -   qemu_ram_alloc_internal() -     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) -         ram_flags |= RAM_READONLY; -     new_block = qemu_ram_alloc_from_fd(...) - -I have attached a draft fix; try it and let me know. -My console window looks fine before and after cpr, using --vnc $hostip:0 -vga qxl - -- Steve -Regarding the reproduce: when I launch the buggy version with the same -options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, -my VNC client silently hangs on the target after a while. Could it -happen on your stand as well? -cpr does not preserve the vnc connection and session. To test, I specify -port 0 for the source VM and port 1 for the dest. When the src vnc goes -dormant the dest vnc becomes active. -Sure, I meant that VNC on the dest (on the port 1) works for a while -after the migration and then hangs, apparently after the guest QXL crash. -Could you try launching VM with -"-nographic -device qxl-vga"? That way VM's serial console is given you -directly in the shell, so when qxl driver crashes you're still able to -inspect the kernel messages. -I have been running like that, but have not reproduced the qxl driver -crash, -and I suspect my guest image+kernel is too old. -Yes, that's probably the case. But the crash occurs on my Fedora 41 -guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to -be buggy. -However, once I realized the -issue was post-cpr modification of qxl memory, I switched my attention -to the -fix. -As for your patch, I can report that it doesn't resolve the issue as it -is. But I was able to track down another possible memory corruption -using your approach with readonly mmap'ing: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); -[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] -(gdb) bt -#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 -#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, -errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 -#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, -errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 -#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, -errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 -#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, -value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 -#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, -v=0x5638996f3770, name=0x56389759b141 "realized", -opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) -     at ../qom/object.c:2374 -#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, -name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) -     at ../qom/object.c:1449 -#7 0x00005638970f8586 in object_property_set_qobject -(obj=0x5638996e0e70, name=0x56389759b141 "realized", -value=0x5638996df900, errp=0x7ffd3c2b84e0) -     at ../qom/qom-qobject.c:28 -#8 0x00005638970f3d8d in object_property_set_bool -(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, -errp=0x7ffd3c2b84e0) -     at ../qom/object.c:1519 -#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, -bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 -#10 0x0000563896dba675 in qdev_device_add_from_qdict -(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ -system/qdev-monitor.c:714 -#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, -errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 -#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, -opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ -vl.c:1207 -#13 0x000056389737a6cc in qemu_opts_foreach -     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca -<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) -     at ../util/qemu-option.c:1135 -#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ -vl.c:2745 -#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 -<error_fatal>) at ../system/vl.c:2806 -#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) -at ../system/vl.c:3838 -#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ -system/main.c:72 -So the attached adjusted version of your patch does seem to help. At -least I can't reproduce the crash on my stand. -Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram -are -definitely harmful. Try V2 of the patch, attached, which skips the lines -of init_qxl_ram that modify guest memory. -Thanks, your v2 patch does seem to prevent the crash. Would you re-send -it to the list as a proper fix? -Yes. Was waiting for your confirmation. -I'm wondering, could it be useful to explicitly mark all the reused -memory regions readonly upon cpr-transfer, and then make them writable -back again after the migration is done? That way we will be segfaulting -early on instead of debugging tricky memory corruptions. -It's a useful debugging technique, but changing protection on a large -memory region -can be too expensive for production due to TLB shootdowns. - -Also, there are cases where writes are performed but the value is -guaranteed to -be the same: -  qxl_post_load() -    qxl_set_mode() -      d->rom->mode = cpu_to_le32(modenr); -The value is the same because mode and shadow_rom.mode were passed in -vmstate -from old qemu. -There're also cases where devices' ROM might be re-initialized. E.g. -this segfault occures upon further exploration of RO mapped RAM blocks: -Program terminated with signal SIGSEGV, Segmentation fault. -#0 __memmove_avx_unaligned_erms () at -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -664            rep    movsb -[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] -(gdb) bt -#0 __memmove_avx_unaligned_erms () at -../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 -#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, -name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) -    at ../hw/core/loader.c:1032 -#2 0x000055aa1d031577 in rom_add_blob -    (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, -addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", -fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, -read_only=true) at ../hw/core/loader.c:1147 -#3 0x000055aa1cfd788d in acpi_add_rom_blob -    (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, -blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46 -#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 -#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) -at ../hw/i386/pc.c:638 -#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 -<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 -#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at -../hw/core/machine.c:1749 -#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 -<error_fatal>) at ../system/vl.c:2779 -#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 -<error_fatal>) at ../system/vl.c:2807 -#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at -../system/vl.c:3838 -#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at -../system/main.c:72 -I'm not sure whether ACPI tables ROM in particular is rewritten with the -same content, but there might be cases where ROM can be read from file -system upon initialization. That is undesirable as guest kernel -certainly won't be too happy about sudden change of the device's ROM -content. - -So the issue we're dealing with here is any unwanted memory related -device initialization upon cpr. - -For now the only thing that comes to my mind is to make a test where we -put as many devices as we can into a VM, make ram blocks RO upon cpr -(and remap them as RW later after migration is done, if needed), and -catch any unwanted memory violations. As Den suggested, we might -consider adding that behaviour as a separate non-default option (or -"migrate" command flag specific to cpr-transfer), which would only be -used in the testing. -I'll look into adding an option, but there may be too many false positives, -such as the qxl_set_mode case above. And the maintainers may object to me -eliminating the false positives by adding more CPR_IN tests, due to gratuitous -(from their POV) ugliness. - -But I will use the technique to look for more write violations. -Andrey -No way. ACPI with the source must be used in the same way as BIOSes -and optional ROMs. -Yup, its a bug. Will fix. - -- Steve - diff --git a/results/classifier/016/virtual/46572227 b/results/classifier/016/virtual/46572227 deleted file mode 100644 index a2aa74d5..00000000 --- a/results/classifier/016/virtual/46572227 +++ /dev/null @@ -1,433 +0,0 @@ -virtual: 0.980 -x86: 0.924 -operating system: 0.179 -hypervisor: 0.141 -performance: 0.100 -debug: 0.091 -vnc: 0.088 -KVM: 0.065 -user-level: 0.060 -VMM: 0.025 -network: 0.025 -TCG: 0.021 -files: 0.015 -socket: 0.013 -boot: 0.009 -device: 0.009 -PID: 0.007 -permissions: 0.006 -assembly: 0.006 -register: 0.005 -kernel: 0.004 -semantic: 0.004 -architecture: 0.003 -alpha: 0.003 -peripherals: 0.003 -risc-v: 0.002 -graphic: 0.001 -ppc: 0.001 -i386: 0.001 -mistranslation: 0.000 -arm: 0.000 - -[Qemu-devel] [Bug?] Windows 7's time drift obviously while RTC rate switching frequently between high and low timer rate - -Hi, - -We tested with the latest QEMU, and found that time drift obviously (clock fast -in guest) -in Windows 7 64 bits guest in some cases. - -It is easily to reproduce, using the follow QEMU command line to start windows -7: - -# x86_64-softmmu/qemu-system-x86_64 -name win7_64_2U_raw -machine -pc-i440fx-2.6,accel=kvm,usb=off -cpu host -m 2048 -realtime mlock=off -smp -4,sockets=2,cores=2,threads=1 -rtc base=utc,clock=vm,driftfix=slew -no-hpet --global kvm-pit.lost_tick_policy=discard -hda /mnt/nfs/win7_sp1_32_2U_raw -vnc -:11 -netdev tap,id=hn0,vhost=off -device rtl8139,id=net-pci0,netdev=hn0 -device -piix3-usb-uhci,id=usb -device usb-tablet,id=input0 -device usb-mouse,id=input1 --device usb-kbd,id=input2 -monitor stdio - -Adjust the VM's time to host time, and run java application or run the follow -program -in windows 7: - -#pragma comment(lib, "winmm") -#include <stdio.h> -#include <windows.h> - -#define SWITCH_PEROID 13 - -int main() -{ - DWORD count = 0; - - while (1) - { - count++; - timeBeginPeriod(1); - DWORD start = timeGetTime(); - Sleep(40); - timeEndPeriod(1); - if ((count % SWITCH_PEROID) == 0) { - Sleep(1); - } - } - return 0; -} - -After few minutes, you will find that the time in windows 7 goes ahead of the -host time, drifts about several seconds. - -I have dug deeper in this problem. For windows systems that use the CMOS timer, -the base interrupt rate is usually 64Hz, but running some application in VM -will raise the timer rate to 1024Hz, running java application and or above -program will raise the timer rate. -Besides, Windows operating systems generally keep time by counting timer -interrupts (ticks). But QEMU seems not emulate the rate converting fine. - -We update the timer in function periodic_timer_update(): -static void periodic_timer_update(RTCState *s, int64_t current_time) -{ - - cur_clock = muldiv64(current_time, RTC_CLOCK_RATE, get_ticks_per_sec()); - next_irq_clock = (cur_clock & ~(period - 1)) + period; - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Here we calculate the next interrupt time by align the current clock with the -new period, I'm a little confused that why we care about the *history* time ? -If VM switches from high rate to low rate, the next interrupt time may come -earlier than it supposed to be. We have observed it in our test. we printed the -interval time of interrupts and the VM's current time (We got the time from VM). - -Here is part of the log: -... ... -period=512 irq inject 1534: 15625 us -Tue Mar 29 04:38:00 2016 -*irq_num_period_32=0, irq_num_period_512=64: [3]: Real time interval is 999696 -us -... ... -*irq_num_period_32=893, irq_num_period_512=9 [81]: Real time interval is 951086 -us -Convert 32 --- > 512: 703: 96578 us -period=512 irq inject 44391: 12702 us -Convert 512 --- > 32: 704: 12704 us11 -period=32 irq inject 44392: 979 us -... ... -32 --- > 512: 705: 24388 us -period=512 irq inject 44417: 6834 us -Convert 512 --- > 32: 706: 6830 us -period=32 irq inject 44418: 978 us -... ... -Convert 32 --- > 512: 707: 60525 us -period=512 irq inject 44480: 1945 us -Convert 512 --- > 32: 708: 1955 us -period=32 irq inject 44481: 977 us -... ... -Convert 32 --- > 512: 709: 36105 us -period=512 irq inject 44518: 10741 us -Convert 512 --- > 32: 710: 10736 us -period=32 irq inject 44519: 989 us -... ... -Convert 32 --- > 512: 711: 123998 us -period=512 irq inject 44646: 974 us -period=512 irq inject 44647: 15607 us -Convert 512 --- > 32: 712: 16560 us -period=32 irq inject 44648: 980 us -... ... -period=32 irq inject 44738: 974 us -Convert 32 --- > 512: 713: 88828 us -period=512 irq inject 44739: 4885 us -Convert 512 --- > 32: 714: 4882 us -period=32 irq inject 44740: 989 us -... ... -period=32 irq inject 44842: 974 us -Convert 32 --- > 512: 715: 100537 us -period=512 irq inject 44843: 8788 us -Convert 512 --- > 32: 716: 8789 us -period=32 irq inject 44844: 972 us -... ... -period=32 irq inject 44941: 979 us -Convert 32 --- > 512: 717: 95677 us -period=512 irq inject 44942: 13661 us -Convert 512 --- > 32: 718: 13657 us -period=32 irq inject 44943: 987 us -... ... -Convert 32 --- > 512: 719: 94690 us -period=512 irq inject 45040: 14643 us -Convert 512 --- > 32: 720: 14642 us -period=32 irq inject 45041: 974 us -... ... -Convert 32 --- > 512: 721: 88848 us -period=512 irq inject 45132: 4892 us -Convert 512 --- > 32: 722: 4931 us -period=32 irq inject 45133: 964 us -... ... -Tue Mar 29 04:39:19 2016 -*irq_num_period_32:835, irq_num_period_512:11 [82], Real time interval is -911520 us - -For windows 7, it has got 835 IRQs which injected during the period of 32, -and got 11 IRQs that injected during the period of 512. it updated the -wall-clock -time with one second, because it supposed it has counted -(835*976.5+11*15625)= 987252.5 us, but the real interval time is 911520 us. - -IMHO, we should calculate the next interrupt time based on the time of last -interrupt injected, and it seems to be more similar with hardware CMOS timer -in this way. -Maybe someone can tell me the reason why we calculated the interrupt timer -in that way, or is it a bug ? ;) - -Thanks, -Hailiang - -ping... - -It seems that we can eliminate the drift by the following patch. -(I tested it for two hours, and there is no drift, before, the timer -in Windows 7 drifts about 2 seconds per minute.) I'm not sure if it is -the right way to solve the problem. -Any comments are welcomed. Thanks. - -From bd6acd577cbbc9d92d6376c770219470f184f7de Mon Sep 17 00:00:00 2001 -From: zhanghailiang <address@hidden> -Date: Thu, 31 Mar 2016 16:36:15 -0400 -Subject: [PATCH] timer/mc146818rtc: fix timer drift in Windows OS while RTC - rate converting frequently - -Signed-off-by: zhanghailiang <address@hidden> ---- - hw/timer/mc146818rtc.c | 25 ++++++++++++++++++++++--- - 1 file changed, 22 insertions(+), 3 deletions(-) - -diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c -index 2ac0fd3..e39d2da 100644 ---- a/hw/timer/mc146818rtc.c -+++ b/hw/timer/mc146818rtc.c -@@ -79,6 +79,7 @@ typedef struct RTCState { - /* periodic timer */ - QEMUTimer *periodic_timer; - int64_t next_periodic_time; -+ uint64_t last_periodic_time; - /* update-ended timer */ - QEMUTimer *update_timer; - uint64_t next_alarm_time; -@@ -152,7 +153,8 @@ static void rtc_coalesced_timer(void *opaque) - static void periodic_timer_update(RTCState *s, int64_t current_time) - { - int period_code, period; -- int64_t cur_clock, next_irq_clock; -+ int64_t cur_clock, next_irq_clock, pre_irq_clock; -+ bool change = false; - - period_code = s->cmos_data[RTC_REG_A] & 0x0f; - if (period_code != 0 -@@ -165,14 +167,28 @@ static void periodic_timer_update(RTCState *s, int64_t -current_time) - if (period != s->period) { - s->irq_coalesced = (s->irq_coalesced * s->period) / period; - DPRINTF_C("cmos: coalesced irqs scaled to %d\n", s->irq_coalesced); -+ if (s->period && period) { -+ change = true; -+ } - } - s->period = period; - #endif - /* compute 32 khz clock */ - cur_clock = - muldiv64(current_time, RTC_CLOCK_RATE, NANOSECONDS_PER_SECOND); -+ if (change) { -+ int offset = 0; - -- next_irq_clock = (cur_clock & ~(period - 1)) + period; -+ pre_irq_clock = muldiv64(s->last_periodic_time, RTC_CLOCK_RATE, -+ NANOSECONDS_PER_SECOND); -+ if ((cur_clock - pre_irq_clock) > period) { -+ offset = (cur_clock - pre_irq_clock) / period; -+ } -+ s->irq_coalesced += offset; -+ next_irq_clock = pre_irq_clock + (offset + 1) * period; -+ } else { -+ next_irq_clock = (cur_clock & ~(period - 1)) + period; -+ } - s->next_periodic_time = muldiv64(next_irq_clock, -NANOSECONDS_PER_SECOND, - RTC_CLOCK_RATE) + 1; - timer_mod(s->periodic_timer, s->next_periodic_time); -@@ -187,7 +203,9 @@ static void periodic_timer_update(RTCState *s, int64_t -current_time) - static void rtc_periodic_timer(void *opaque) - { - RTCState *s = opaque; -- -+ int64_t next_periodic_time; -+ -+ next_periodic_time = s->next_periodic_time; - periodic_timer_update(s, s->next_periodic_time); - s->cmos_data[RTC_REG_C] |= REG_C_PF; - if (s->cmos_data[RTC_REG_B] & REG_B_PIE) { -@@ -204,6 +222,7 @@ static void rtc_periodic_timer(void *opaque) - DPRINTF_C("cmos: coalesced irqs increased to %d\n", - s->irq_coalesced); - } -+ s->last_periodic_time = next_periodic_time; - } else - #endif - qemu_irq_raise(s->irq); --- -1.8.3.1 - - -On 2016/3/29 19:58, Hailiang Zhang wrote: -Hi, - -We tested with the latest QEMU, and found that time drift obviously (clock fast -in guest) -in Windows 7 64 bits guest in some cases. - -It is easily to reproduce, using the follow QEMU command line to start windows -7: - -# x86_64-softmmu/qemu-system-x86_64 -name win7_64_2U_raw -machine -pc-i440fx-2.6,accel=kvm,usb=off -cpu host -m 2048 -realtime mlock=off -smp -4,sockets=2,cores=2,threads=1 -rtc base=utc,clock=vm,driftfix=slew -no-hpet --global kvm-pit.lost_tick_policy=discard -hda /mnt/nfs/win7_sp1_32_2U_raw -vnc -:11 -netdev tap,id=hn0,vhost=off -device rtl8139,id=net-pci0,netdev=hn0 -device -piix3-usb-uhci,id=usb -device usb-tablet,id=input0 -device usb-mouse,id=input1 --device usb-kbd,id=input2 -monitor stdio - -Adjust the VM's time to host time, and run java application or run the follow -program -in windows 7: - -#pragma comment(lib, "winmm") -#include <stdio.h> -#include <windows.h> - -#define SWITCH_PEROID 13 - -int main() -{ - DWORD count = 0; - - while (1) - { - count++; - timeBeginPeriod(1); - DWORD start = timeGetTime(); - Sleep(40); - timeEndPeriod(1); - if ((count % SWITCH_PEROID) == 0) { - Sleep(1); - } - } - return 0; -} - -After few minutes, you will find that the time in windows 7 goes ahead of the -host time, drifts about several seconds. - -I have dug deeper in this problem. For windows systems that use the CMOS timer, -the base interrupt rate is usually 64Hz, but running some application in VM -will raise the timer rate to 1024Hz, running java application and or above -program will raise the timer rate. -Besides, Windows operating systems generally keep time by counting timer -interrupts (ticks). But QEMU seems not emulate the rate converting fine. - -We update the timer in function periodic_timer_update(): -static void periodic_timer_update(RTCState *s, int64_t current_time) -{ - - cur_clock = muldiv64(current_time, RTC_CLOCK_RATE, -get_ticks_per_sec()); - next_irq_clock = (cur_clock & ~(period - 1)) + period; - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Here we calculate the next interrupt time by align the current clock with the -new period, I'm a little confused that why we care about the *history* time ? -If VM switches from high rate to low rate, the next interrupt time may come -earlier than it supposed to be. We have observed it in our test. we printed the -interval time of interrupts and the VM's current time (We got the time from VM). - -Here is part of the log: -... ... -period=512 irq inject 1534: 15625 us -Tue Mar 29 04:38:00 2016 -*irq_num_period_32=0, irq_num_period_512=64: [3]: Real time interval is 999696 -us -... ... -*irq_num_period_32=893, irq_num_period_512=9 [81]: Real time interval is 951086 -us -Convert 32 --- > 512: 703: 96578 us -period=512 irq inject 44391: 12702 us -Convert 512 --- > 32: 704: 12704 us11 -period=32 irq inject 44392: 979 us -... ... -32 --- > 512: 705: 24388 us -period=512 irq inject 44417: 6834 us -Convert 512 --- > 32: 706: 6830 us -period=32 irq inject 44418: 978 us -... ... -Convert 32 --- > 512: 707: 60525 us -period=512 irq inject 44480: 1945 us -Convert 512 --- > 32: 708: 1955 us -period=32 irq inject 44481: 977 us -... ... -Convert 32 --- > 512: 709: 36105 us -period=512 irq inject 44518: 10741 us -Convert 512 --- > 32: 710: 10736 us -period=32 irq inject 44519: 989 us -... ... -Convert 32 --- > 512: 711: 123998 us -period=512 irq inject 44646: 974 us -period=512 irq inject 44647: 15607 us -Convert 512 --- > 32: 712: 16560 us -period=32 irq inject 44648: 980 us -... ... -period=32 irq inject 44738: 974 us -Convert 32 --- > 512: 713: 88828 us -period=512 irq inject 44739: 4885 us -Convert 512 --- > 32: 714: 4882 us -period=32 irq inject 44740: 989 us -... ... -period=32 irq inject 44842: 974 us -Convert 32 --- > 512: 715: 100537 us -period=512 irq inject 44843: 8788 us -Convert 512 --- > 32: 716: 8789 us -period=32 irq inject 44844: 972 us -... ... -period=32 irq inject 44941: 979 us -Convert 32 --- > 512: 717: 95677 us -period=512 irq inject 44942: 13661 us -Convert 512 --- > 32: 718: 13657 us -period=32 irq inject 44943: 987 us -... ... -Convert 32 --- > 512: 719: 94690 us -period=512 irq inject 45040: 14643 us -Convert 512 --- > 32: 720: 14642 us -period=32 irq inject 45041: 974 us -... ... -Convert 32 --- > 512: 721: 88848 us -period=512 irq inject 45132: 4892 us -Convert 512 --- > 32: 722: 4931 us -period=32 irq inject 45133: 964 us -... ... -Tue Mar 29 04:39:19 2016 -*irq_num_period_32:835, irq_num_period_512:11 [82], Real time interval is -911520 us - -For windows 7, it has got 835 IRQs which injected during the period of 32, -and got 11 IRQs that injected during the period of 512. it updated the -wall-clock -time with one second, because it supposed it has counted -(835*976.5+11*15625)= 987252.5 us, but the real interval time is 911520 us. - -IMHO, we should calculate the next interrupt time based on the time of last -interrupt injected, and it seems to be more similar with hardware CMOS timer -in this way. -Maybe someone can tell me the reason why we calculated the interrupt timer -in that way, or is it a bug ? ;) - -Thanks, -Hailiang - diff --git a/results/classifier/016/virtual/53568181 b/results/classifier/016/virtual/53568181 deleted file mode 100644 index 1a1dafcd..00000000 --- a/results/classifier/016/virtual/53568181 +++ /dev/null @@ -1,105 +0,0 @@ -virtual: 0.966 -x86: 0.963 -performance: 0.800 -KVM: 0.696 -kernel: 0.598 -debug: 0.238 -hypervisor: 0.107 -device: 0.097 -graphic: 0.058 -operating system: 0.053 -i386: 0.045 -files: 0.018 -user-level: 0.011 -register: 0.009 -TCG: 0.007 -architecture: 0.006 -semantic: 0.006 -PID: 0.005 -alpha: 0.005 -socket: 0.004 -peripherals: 0.004 -VMM: 0.003 -network: 0.003 -risc-v: 0.002 -permissions: 0.002 -boot: 0.002 -assembly: 0.001 -vnc: 0.001 -mistranslation: 0.001 -ppc: 0.000 -arm: 0.000 - -[BUG] x86/PAT handling severely crippled AMD-V SVM KVM performance - -Hi, I maintain an out-of-tree 3D APIs pass-through QEMU device models at -https://github.com/kjliew/qemu-3dfx -that provide 3D acceleration for legacy -32-bit Windows guests (Win98SE, WinME, Win2k and WinXP) with the focus on -playing old legacy games from 1996-2003. It currently supports the now-defunct -3Dfx propriety API called Glide and an alternative OpenGL pass-through based on -MESA implementation. - -The basic concept of both implementations create memory-mapped virtual -interfaces consist of host/guest shared memory with guest-push model instead of -a more common host-pull model for typical QEMU device model implementation. -Guest uses shared memory as FIFOs for drawing commands and data to bulk up the -operations until serialization event that flushes the FIFOs into host. This -achieves extremely good performance since virtual CPUs are fast with hardware -acceleration (Intel VT/AMD-V) and reduces the overhead of frequent VMEXITs to -service the device emulation. Both implementations work on Windows 10 with WHPX -and HAXM accelerators as well as KVM in Linux. - -On Windows 10, QEMU WHPX implementation does not sync MSR_IA32_PAT during -host/guest states sync. There is no visibility into the closed-source WHPX on -how things are managed behind the scene, but from measuring performance figures -I can conclude that it didn't handle the MSR_IA32_PAT correctly for both Intel -and AMD. Call this fair enough, if you will, it didn't flag any concerns, in -fact games such as Quake2 and Quake3 were still within playable frame rate of -40~60FPS on Win2k/XP guest. Until the same games were run on Win98/ME guest and -the frame rate blew off the roof (300~500FPS) on the same CPU and GPU. In fact, -the later seemed to be more inlined with runnng the games bare-metal with vsync -off. - -On Linux (at the time of writing kernel 5.6.7/Mesa 20.0), the difference -prevailed. Intel CPUs (and it so happened that I was on laptop with Intel GPU), -the VMX-based kvm_intel got it right while SVM-based kvm_amd did not. -To put this in simple exaggeration, an aging Core i3-4010U/HD Graphics 4400 -(Haswell GT2) exhibited an insane performance in Quake2/Quake3 timedemos that -totally crushed more recent AMD Ryzen 2500U APU/Vega 8 Graphics and AMD -FX8300/NVIDIA GT730 on desktop. Simply unbelievable! - -It turned out that there was something to do with AMD-V NPT. By loading kvm_amd -with npt=0, AMD Ryzen APU and FX8300 regained a huge performance leap. However, -AMD NPT issue with KVM was supposedly fixed in 2017 kernel commits. NPT=0 would -actually incur performance loss for VM due to intervention required by -hypervisors to maintain the shadow page tables. Finally, I was able to find the -pointer that pointed to MSR_IA32_PAT register. By updating the MSR_IA32_PAT to -0x0606xxxx0606xxxxULL, AMD CPUs now regain their rightful performance without -taking the hit of NPT=0 for Linux KVM. Taking the same solution into Windows, -both Intel and AMD CPUs no longer require Win98/ME guest to unleash the full -performance potentials and performance figures based on games measured on WHPX -were not very far behind Linux KVM. - -So I guess the problem lies in host/guest shared memory regions mapped as -uncacheable from virtual CPU perspective. As virtual CPUs now completely execute -in hardware context with x86 hardware virtualiztion extensions, the cacheability -of memory types would severely impact the performance on guests. WHPX didn't -handle it for both Intel EPT and AMD NPT, but KVM seems to do it right for Intel -EPT. I don't have the correct fix for QEMU. But what I can do for my 3D APIs -pass-through device models is to implement host-side hooks to reprogram and -restore MSR_IA32_PAT upon activation/deactivation of the 3D APIs. Perhaps there -is also a better solution of having the proper kernel drivers for virtual -interfaces to manage the memory types of host/guest shared memory in kernel -space, but to do that and the needs of Microsoft tools/DDKs, I will just forget -it. The guest stubs uses the same kernel drivers included in 3Dfx drivers for -memory mapping and the virtual interfaces remain driver-less from Windows OS -perspective. Considering the current state of halting progress for QEMU native -virgil3D to support Windows OS, I am just being pragmatic. I understand that -QEMU virgil3D will eventually bring 3D acceleration for Windows guests, but I do -not expect anything to support legacy 32-bit Windows OSes which have out-grown -their commercial usefulness. - -Regards, -KJ Liew - diff --git a/results/classifier/016/virtual/57231878 b/results/classifier/016/virtual/57231878 deleted file mode 100644 index f18786d1..00000000 --- a/results/classifier/016/virtual/57231878 +++ /dev/null @@ -1,269 +0,0 @@ -virtual: 0.932 -x86: 0.926 -hypervisor: 0.360 -user-level: 0.314 -debug: 0.271 -operating system: 0.200 -TCG: 0.039 -kernel: 0.039 -PID: 0.025 -files: 0.025 -boot: 0.023 -register: 0.020 -network: 0.018 -VMM: 0.014 -socket: 0.013 -semantic: 0.012 -device: 0.012 -alpha: 0.006 -architecture: 0.005 -performance: 0.005 -KVM: 0.003 -risc-v: 0.003 -assembly: 0.003 -graphic: 0.002 -vnc: 0.002 -mistranslation: 0.001 -peripherals: 0.001 -permissions: 0.001 -ppc: 0.000 -i386: 0.000 -arm: 0.000 - -[Qemu-devel] [BUG] qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. - -Hello all, -I wanted to submit a bug report in the tracker, but it seem to require -an Ubuntu One account, which I'm having trouble with, so I'll just -give it here and hopefully somebody can make use of it. The issue -seems to be in an experimental format, so it's likely not very -consequential anyway. - -For the sake of anyone else simply googling for a workaround, I'll -just paste in the (cleaned up) brief IRC conversation about my issue -from the official channel: -<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, -Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine -(FreeBSD-11.1). The VM always aborts at the same point in the -installation (downloading 'ports.tgz') with the following error -message: -"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: -qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. -zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 --enable-kvm -hda freebsd/freebsd.qed -devic" -The commands I ran to create the machine are as follows: -"qemu-img create -f qed freebsd/freebsd.qed 16G" -"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda -freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 --cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" -I tried adding logging options with the -d flag, but I didn't get -anything that seemed relevant, since I'm not sure what to look for. -<stsquad> ohh what's a qed device? -<stsquad> quy: it might be a workaround to use a qcow2 image for now -<stsquad> ahh the wiki has a statement "It is not recommended to use -QED for any new images. " -<danpb> 'qed' was an experimental disk image format created by IBM -before qcow2 v3 came along -<danpb> honestly nothing should ever use QED these days -<danpb> the good ideas from QED became qcow2v3 -<stsquad> danpb: sounds like we should put a warning on the option to -remind users of that fact -<danpb> quy: sounds like qed driver is simply broken - please do file -a bug against qemu bug tracker -<danpb> quy: but you should also really switch to qcow2 -<quy> I see; some people need to update their wikis then. I don't -remember where which guide I read when I first learned what little -QEMU I know, but I remember it specifically remember it saying QED was -the newest and most optimal format. -<stsquad> quy: we can only be responsible for our own wiki I'm afraid... -<danpb> if you remember where you saw that please let us know so we -can try to get it fixed -<quy> Thank you very much for the info; I will switch to QCOW. -Unfortunately, I'm not sure if I will be able to file any bug reports -in the tracker as I can't seem to log Launchpad, which it seems to -require. -<danpb> quy: an email to the mailing list would suffice too if you -can't deal with launchpad -<danpb> kwolf: ^^^ in case you're interested in possible QED -assertions from 2.12 - -If any more info is needed, feel free to email me; I'm not actually -subscribed to this list though. -Thank you, -Quytelda Kahja - -CC Qemu Block; looks like QED is a bit busted. - -On 06/27/2018 10:25 AM, Quytelda Kahja wrote: -> -Hello all, -> -I wanted to submit a bug report in the tracker, but it seem to require -> -an Ubuntu One account, which I'm having trouble with, so I'll just -> -give it here and hopefully somebody can make use of it. The issue -> -seems to be in an experimental format, so it's likely not very -> -consequential anyway. -> -> -For the sake of anyone else simply googling for a workaround, I'll -> -just paste in the (cleaned up) brief IRC conversation about my issue -> -from the official channel: -> -<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, -> -Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine -> -(FreeBSD-11.1). The VM always aborts at the same point in the -> -installation (downloading 'ports.tgz') with the following error -> -message: -> -"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: -> -qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. -> -zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 -> --enable-kvm -hda freebsd/freebsd.qed -devic" -> -The commands I ran to create the machine are as follows: -> -"qemu-img create -f qed freebsd/freebsd.qed 16G" -> -"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda -> -freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 -> --cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" -> -I tried adding logging options with the -d flag, but I didn't get -> -anything that seemed relevant, since I'm not sure what to look for. -> -<stsquad> ohh what's a qed device? -> -<stsquad> quy: it might be a workaround to use a qcow2 image for now -> -<stsquad> ahh the wiki has a statement "It is not recommended to use -> -QED for any new images. " -> -<danpb> 'qed' was an experimental disk image format created by IBM -> -before qcow2 v3 came along -> -<danpb> honestly nothing should ever use QED these days -> -<danpb> the good ideas from QED became qcow2v3 -> -<stsquad> danpb: sounds like we should put a warning on the option to -> -remind users of that fact -> -<danpb> quy: sounds like qed driver is simply broken - please do file -> -a bug against qemu bug tracker -> -<danpb> quy: but you should also really switch to qcow2 -> -<quy> I see; some people need to update their wikis then. I don't -> -remember where which guide I read when I first learned what little -> -QEMU I know, but I remember it specifically remember it saying QED was -> -the newest and most optimal format. -> -<stsquad> quy: we can only be responsible for our own wiki I'm afraid... -> -<danpb> if you remember where you saw that please let us know so we -> -can try to get it fixed -> -<quy> Thank you very much for the info; I will switch to QCOW. -> -Unfortunately, I'm not sure if I will be able to file any bug reports -> -in the tracker as I can't seem to log Launchpad, which it seems to -> -require. -> -<danpb> quy: an email to the mailing list would suffice too if you -> -can't deal with launchpad -> -<danpb> kwolf: ^^^ in case you're interested in possible QED -> -assertions from 2.12 -> -> -If any more info is needed, feel free to email me; I'm not actually -> -subscribed to this list though. -> -Thank you, -> -Quytelda Kahja -> - -On 06/29/2018 03:07 PM, John Snow wrote: -CC Qemu Block; looks like QED is a bit busted. - -On 06/27/2018 10:25 AM, Quytelda Kahja wrote: -Hello all, -I wanted to submit a bug report in the tracker, but it seem to require -an Ubuntu One account, which I'm having trouble with, so I'll just -give it here and hopefully somebody can make use of it. The issue -seems to be in an experimental format, so it's likely not very -consequential anyway. -Analysis in another thread may be relevant: -https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html --- -Eric Blake, Principal Software Engineer -Red Hat, Inc. +1-919-301-3266 -Virtualization: qemu.org | libvirt.org - -Am 29.06.2018 um 22:16 hat Eric Blake geschrieben: -> -On 06/29/2018 03:07 PM, John Snow wrote: -> -> CC Qemu Block; looks like QED is a bit busted. -> -> -> -> On 06/27/2018 10:25 AM, Quytelda Kahja wrote: -> -> > Hello all, -> -> > I wanted to submit a bug report in the tracker, but it seem to require -> -> > an Ubuntu One account, which I'm having trouble with, so I'll just -> -> > give it here and hopefully somebody can make use of it. The issue -> -> > seems to be in an experimental format, so it's likely not very -> -> > consequential anyway. -> -> -Analysis in another thread may be relevant: -> -> -https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html -The assertion there was: - -qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion -`!atomic_read(&to->in_flight)' failed. - -Which quite clearly pointed to a drain bug. This one, however, doesn't -seem to be related to drain, so I think it's probably a different bug. - -Kevin - diff --git a/results/classifier/016/virtual/67821138 b/results/classifier/016/virtual/67821138 deleted file mode 100644 index 62ec4fff..00000000 --- a/results/classifier/016/virtual/67821138 +++ /dev/null @@ -1,226 +0,0 @@ -virtual: 0.897 -hypervisor: 0.505 -debug: 0.461 -PID: 0.283 -operating system: 0.187 -KVM: 0.099 -kernel: 0.073 -VMM: 0.070 -TCG: 0.049 -register: 0.044 -x86: 0.036 -permissions: 0.032 -files: 0.027 -risc-v: 0.025 -device: 0.017 -user-level: 0.014 -i386: 0.013 -alpha: 0.013 -socket: 0.010 -assembly: 0.009 -network: 0.007 -ppc: 0.007 -architecture: 0.006 -vnc: 0.006 -semantic: 0.005 -arm: 0.004 -graphic: 0.004 -performance: 0.004 -peripherals: 0.002 -boot: 0.001 -mistranslation: 0.000 - -[BUG, RFC] Base node is in RW after making external snapshot - -Hi everyone, - -When making an external snapshot, we end up in a situation when 2 block -graph nodes related to the same image file (format and storage nodes) -have different RO flags set on them. - -E.g. - -# ls -la /proc/PID/fd -lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd - -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' ---pretty | egrep '"node-name"|"ro"' - "ro": false, - "node-name": "libvirt-1-format", - "ro": false, - "node-name": "libvirt-1-storage", - -# virsh snapshot-create-as VM --name snap --disk-only -Domain snapshot snap created - -# ls -la /proc/PID/fd -lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd -lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap - -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' ---pretty | egrep '"node-name"|"ro"' - "ro": false, - "node-name": "libvirt-2-format", - "ro": false, - "node-name": "libvirt-2-storage", - "ro": true, - "node-name": "libvirt-1-format", - "ro": false, <-------------- - "node-name": "libvirt-1-storage", - -File descriptor has been reopened in RO, but "libvirt-1-storage" node -still has RW permissions set. - -I'm wondering it this a bug or this is intended? Looks like a bug to -me, although I see that some iotests (e.g. 273) expect 2 nodes related -to the same image file to have different RO flags. - -bdrv_reopen_set_read_only() - bdrv_reopen() - bdrv_reopen_queue() - bdrv_reopen_queue_child() - bdrv_reopen_multiple() - bdrv_list_refresh_perms() - bdrv_topological_dfs() - bdrv_do_refresh_perms() - bdrv_reopen_commit() - -In the stack above bdrv_reopen_set_read_only() is only being called for -the parent (libvirt-1-format) node. There're 2 lists: BDSs from -refresh_list are used by bdrv_drv_set_perm and this leads to actual -reopen with RO of the file descriptor. And then there's reopen queue -bs_queue -- BDSs from this queue get their parameters updated. While -refresh_list ends up having the whole subtree (including children, this -is done in bdrv_topological_dfs()) bs_queue only has the parent. And -that is because storage (child) node's (bs->inherits_from == NULL), so -bdrv_reopen_queue_child() never adds it to the queue. Could it be the -source of this bug? - -Anyway, would greatly appreciate a clarification. - -Andrey - -On 4/24/24 21:00, Andrey Drobyshev wrote: -> -Hi everyone, -> -> -When making an external snapshot, we end up in a situation when 2 block -> -graph nodes related to the same image file (format and storage nodes) -> -have different RO flags set on them. -> -> -E.g. -> -> -# ls -la /proc/PID/fd -> -lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd -> -> -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' -> ---pretty | egrep '"node-name"|"ro"' -> -"ro": false, -> -"node-name": "libvirt-1-format", -> -"ro": false, -> -"node-name": "libvirt-1-storage", -> -> -# virsh snapshot-create-as VM --name snap --disk-only -> -Domain snapshot snap created -> -> -# ls -la /proc/PID/fd -> -lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd -> -lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap -> -> -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' -> ---pretty | egrep '"node-name"|"ro"' -> -"ro": false, -> -"node-name": "libvirt-2-format", -> -"ro": false, -> -"node-name": "libvirt-2-storage", -> -"ro": true, -> -"node-name": "libvirt-1-format", -> -"ro": false, <-------------- -> -"node-name": "libvirt-1-storage", -> -> -File descriptor has been reopened in RO, but "libvirt-1-storage" node -> -still has RW permissions set. -> -> -I'm wondering it this a bug or this is intended? Looks like a bug to -> -me, although I see that some iotests (e.g. 273) expect 2 nodes related -> -to the same image file to have different RO flags. -> -> -bdrv_reopen_set_read_only() -> -bdrv_reopen() -> -bdrv_reopen_queue() -> -bdrv_reopen_queue_child() -> -bdrv_reopen_multiple() -> -bdrv_list_refresh_perms() -> -bdrv_topological_dfs() -> -bdrv_do_refresh_perms() -> -bdrv_reopen_commit() -> -> -In the stack above bdrv_reopen_set_read_only() is only being called for -> -the parent (libvirt-1-format) node. There're 2 lists: BDSs from -> -refresh_list are used by bdrv_drv_set_perm and this leads to actual -> -reopen with RO of the file descriptor. And then there's reopen queue -> -bs_queue -- BDSs from this queue get their parameters updated. While -> -refresh_list ends up having the whole subtree (including children, this -> -is done in bdrv_topological_dfs()) bs_queue only has the parent. And -> -that is because storage (child) node's (bs->inherits_from == NULL), so -> -bdrv_reopen_queue_child() never adds it to the queue. Could it be the -> -source of this bug? -> -> -Anyway, would greatly appreciate a clarification. -> -> -Andrey -Friendly ping. Could somebody confirm that it is a bug indeed? - diff --git a/results/classifier/016/virtual/70021271 b/results/classifier/016/virtual/70021271 deleted file mode 100644 index 5563eb0f..00000000 --- a/results/classifier/016/virtual/70021271 +++ /dev/null @@ -1,7475 +0,0 @@ -virtual: 0.943 -debug: 0.870 -x86: 0.251 -hypervisor: 0.249 -device: 0.054 -files: 0.053 -kernel: 0.045 -register: 0.044 -TCG: 0.040 -PID: 0.037 -i386: 0.035 -operating system: 0.032 -VMM: 0.030 -ppc: 0.022 -assembly: 0.022 -KVM: 0.017 -user-level: 0.014 -peripherals: 0.014 -performance: 0.014 -boot: 0.009 -risc-v: 0.009 -network: 0.007 -arm: 0.007 -architecture: 0.005 -semantic: 0.005 -socket: 0.005 -permissions: 0.004 -alpha: 0.003 -graphic: 0.003 -vnc: 0.002 -mistranslation: 0.001 - -[Qemu-devel] [BUG]Unassigned mem write during pci device hot-plug - -Hi all, - -In our test, we configured VM with several pci-bridges and a virtio-net nic -been attached with bus 4, -After VM is startup, We ping this nic from host to judge if it is working -normally. Then, we hot add pci devices to this VM with bus 0. -We found the virtio-net NIC in bus 4 is not working (can not connect) -occasionally, as it kick virtio backend failure with error below: - Unassigned mem write 00000000fc803004 = 0x1 - -memory-region: pci_bridge_pci - 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci - 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci - 00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common - 00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr - 00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device - 00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify <- io -mem unassigned - ⦠- -We caught an exceptional address changing while this problem happened, show as -follow: -Before pci_bridge_update_mappingsï¼ - 00000000fc000000-00000000fc1fffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fc000000-00000000fc1fffff - 00000000fc200000-00000000fc3fffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fc200000-00000000fc3fffff - 00000000fc400000-00000000fc5fffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fc400000-00000000fc5fffff - 00000000fc600000-00000000fc7fffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fc600000-00000000fc7fffff - 00000000fc800000-00000000fc9fffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce - 00000000fca00000-00000000fcbfffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fca00000-00000000fcbfffff - 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fcc00000-00000000fcdfffff - 00000000fce00000-00000000fcffffff (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci 00000000fce00000-00000000fcffffff - -After pci_bridge_update_mappingsï¼ - 00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fda00000-00000000fdbfffff - 00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fdc00000-00000000fddfffff - 00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fde00000-00000000fdffffff - 00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fe000000-00000000fe1fffff - 00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fe200000-00000000fe3fffff - 00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fe400000-00000000fe5fffff - 00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fe600000-00000000fe7fffff - 00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem -@pci_bridge_pci 00000000fe800000-00000000fe9fffff - fffffffffc800000-fffffffffc800000 (prio 1, RW): alias pci_bridge_pref_mem -@pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress Space - -We have figured out why this address becomes this value, according to pci -spec, pci driver can get BAR address size by writing 0xffffffff to -the pci register firstly, and then read back the value from this register. -We didn't handle this value specially while process pci write in qemu, the -function call stack is: -Pci_bridge_dev_write_config --> pci_bridge_write_config --> pci_default_write_config (we update the config[address] value here to -fffffffffc800000, which should be 0xfc800000 ) --> pci_bridge_update_mappings - ->pci_bridge_region_del(br, br->windows); --> pci_bridge_region_init - -->pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value -fffffffffc800000) - -> -memory_region_transaction_commit - -So, as we can see, we use the wrong base address in qemu to update the memory -regions, though, we update the base address to -The correct value after pci driver in VM write the original value back, the -virtio NIC in bus 4 may still sends net packets concurrently with -The wrong memory region address. - -We have tried to skip the memory region update action in qemu while detect pci -write with 0xffffffff value, and it does work, but -This seems to be not gently. - -diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -index b2e50c3..84b405d 100644 ---- a/hw/pci/pci_bridge.c -+++ b/hw/pci/pci_bridge.c -@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, - pci_default_write_config(d, address, val, len); -- if (ranges_overlap(address, len, PCI_COMMAND, 2) || -+ if ( (val != 0xffffffff) && -+ (ranges_overlap(address, len, PCI_COMMAND, 2) || - /* io base/limit */ - ranges_overlap(address, len, PCI_IO_BASE, 2) || -@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, - ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || - /* vga enable */ -- ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -+ ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { - pci_bridge_update_mappings(s); - } - -Thinks, -Xu - -On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -Hi all, -> -> -> -> -In our test, we configured VM with several pci-bridges and a virtio-net nic -> -been attached with bus 4, -> -> -After VM is startup, We ping this nic from host to judge if it is working -> -normally. Then, we hot add pci devices to this VM with bus 0. -> -> -We found the virtio-net NIC in bus 4 is not working (can not connect) -> -occasionally, as it kick virtio backend failure with error below: -> -> -Unassigned mem write 00000000fc803004 = 0x1 -Thanks for the report. Which guest was used to produce this problem? - --- -MST - -n Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> Hi all, -> -> -> -> -> -> -> -> In our test, we configured VM with several pci-bridges and a -> -> virtio-net nic been attached with bus 4, -> -> -> -> After VM is startup, We ping this nic from host to judge if it is -> -> working normally. Then, we hot add pci devices to this VM with bus 0. -> -> -> -> We found the virtio-net NIC in bus 4 is not working (can not connect) -> -> occasionally, as it kick virtio backend failure with error below: -> -> -> -> Unassigned mem write 00000000fc803004 = 0x1 -> -> -Thanks for the report. Which guest was used to produce this problem? -> -> --- -> -MST -I was seeing this problem when I hotplug a VFIO device to guest CentOS 7.4, -after that I compiled the latest Linux kernel and it also contains this problem. - -Thinks, -Xu - -On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -Hi all, -> -> -> -> -In our test, we configured VM with several pci-bridges and a virtio-net nic -> -been attached with bus 4, -> -> -After VM is startup, We ping this nic from host to judge if it is working -> -normally. Then, we hot add pci devices to this VM with bus 0. -> -> -We found the virtio-net NIC in bus 4 is not working (can not connect) -> -occasionally, as it kick virtio backend failure with error below: -> -> -Unassigned mem write 00000000fc803004 = 0x1 -> -> -> -> -memory-region: pci_bridge_pci -> -> -0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci -> -> -00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci -> -> -00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common -> -> -00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr -> -> -00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device -> -> -00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify <- io -> -mem unassigned -> -> -⦠-> -> -> -> -We caught an exceptional address changing while this problem happened, show as -> -follow: -> -> -Before pci_bridge_update_mappingsï¼ -> -> -00000000fc000000-00000000fc1fffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fc000000-00000000fc1fffff -> -> -00000000fc200000-00000000fc3fffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fc200000-00000000fc3fffff -> -> -00000000fc400000-00000000fc5fffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fc400000-00000000fc5fffff -> -> -00000000fc600000-00000000fc7fffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fc600000-00000000fc7fffff -> -> -00000000fc800000-00000000fc9fffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce -> -> -00000000fca00000-00000000fcbfffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fca00000-00000000fcbfffff -> -> -00000000fcc00000-00000000fcdfffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fcc00000-00000000fcdfffff -> -> -00000000fce00000-00000000fcffffff (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci 00000000fce00000-00000000fcffffff -> -> -> -> -After pci_bridge_update_mappingsï¼ -> -> -00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fda00000-00000000fdbfffff -> -> -00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fdc00000-00000000fddfffff -> -> -00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fde00000-00000000fdffffff -> -> -00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fe000000-00000000fe1fffff -> -> -00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fe200000-00000000fe3fffff -> -> -00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fe400000-00000000fe5fffff -> -> -00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fe600000-00000000fe7fffff -> -> -00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem -> -@pci_bridge_pci 00000000fe800000-00000000fe9fffff -> -> -fffffffffc800000-fffffffffc800000 (prio 1, RW): alias -> -pci_bridge_pref_mem -> -@pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress -> -Space -This one is empty though right? - -> -> -> -We have figured out why this address becomes this value, according to pci -> -spec, pci driver can get BAR address size by writing 0xffffffff to -> -> -the pci register firstly, and then read back the value from this register. -OK however as you show below the BAR being sized is the BAR -if a bridge. Are you then adding a bridge device by hotplug? - - - -> -We didn't handle this value specially while process pci write in qemu, the -> -function call stack is: -> -> -Pci_bridge_dev_write_config -> -> --> pci_bridge_write_config -> -> --> pci_default_write_config (we update the config[address] value here to -> -fffffffffc800000, which should be 0xfc800000 ) -> -> --> pci_bridge_update_mappings -> -> -->pci_bridge_region_del(br, br->windows); -> -> --> pci_bridge_region_init -> -> --> -> -pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value -> -fffffffffc800000) -> -> --> -> -memory_region_transaction_commit -> -> -> -> -So, as we can see, we use the wrong base address in qemu to update the memory -> -regions, though, we update the base address to -> -> -The correct value after pci driver in VM write the original value back, the -> -virtio NIC in bus 4 may still sends net packets concurrently with -> -> -The wrong memory region address. -> -> -> -> -We have tried to skip the memory region update action in qemu while detect pci -> -write with 0xffffffff value, and it does work, but -> -> -This seems to be not gently. -For sure. But I'm still puzzled as to why does Linux try to -size the BAR of the bridge while a device behind it is -used. - -Can you pls post your QEMU command line? - - - -> -> -> -diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -> -> -index b2e50c3..84b405d 100644 -> -> ---- a/hw/pci/pci_bridge.c -> -> -+++ b/hw/pci/pci_bridge.c -> -> -@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, -> -> -pci_default_write_config(d, address, val, len); -> -> -- if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> -+ if ( (val != 0xffffffff) && -> -> -+ (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> -/* io base/limit */ -> -> -ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> -@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, -> -> -ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -> -/* vga enable */ -> -> -- ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> -> -+ ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { -> -> -pci_bridge_update_mappings(s); -> -> -} -> -> -> -> -Thinks, -> -> -Xu -> - -On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> Hi all, -> -> -> -> -> -> -> -> In our test, we configured VM with several pci-bridges and a -> -> virtio-net nic been attached with bus 4, -> -> -> -> After VM is startup, We ping this nic from host to judge if it is -> -> working normally. Then, we hot add pci devices to this VM with bus 0. -> -> -> -> We found the virtio-net NIC in bus 4 is not working (can not connect) -> -> occasionally, as it kick virtio backend failure with error below: -> -> -> -> Unassigned mem write 00000000fc803004 = 0x1 -> -> -> -> -> -> -> -> memory-region: pci_bridge_pci -> -> -> -> 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci -> -> -> -> 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci -> -> -> -> 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> virtio-pci-common -> -> -> -> 00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr -> -> -> -> 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> virtio-pci-device -> -> -> -> 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> virtio-pci-notify <- io mem unassigned -> -> -> -> ⦠-> -> -> -> -> -> -> -> We caught an exceptional address changing while this problem happened, -> -> show as -> -> follow: -> -> -> -> Before pci_bridge_update_mappingsï¼ -> -> -> -> 00000000fc000000-00000000fc1fffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff -> -> -> -> 00000000fc200000-00000000fc3fffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff -> -> -> -> 00000000fc400000-00000000fc5fffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff -> -> -> -> 00000000fc600000-00000000fc7fffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff -> -> -> -> 00000000fc800000-00000000fc9fffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff -> -> <- correct Adress Spce -> -> -> -> 00000000fca00000-00000000fcbfffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff -> -> -> -> 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff -> -> -> -> 00000000fce00000-00000000fcffffff (prio 1, RW): alias -> -> pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff -> -> -> -> -> -> -> -> After pci_bridge_update_mappingsï¼ -> -> -> -> 00000000fda00000-00000000fdbfffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff -> -> -> -> 00000000fdc00000-00000000fddfffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff -> -> -> -> 00000000fde00000-00000000fdffffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff -> -> -> -> 00000000fe000000-00000000fe1fffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff -> -> -> -> 00000000fe200000-00000000fe3fffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff -> -> -> -> 00000000fe400000-00000000fe5fffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff -> -> -> -> 00000000fe600000-00000000fe7fffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff -> -> -> -> 00000000fe800000-00000000fe9fffff (prio 1, RW): alias -> -> pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff -> -> -> -> fffffffffc800000-fffffffffc800000 (prio 1, RW): alias -> -> pci_bridge_pref_mem -> -> @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress -> -Space -> -> -This one is empty though right? -> -> -> -> -> -> -> We have figured out why this address becomes this value, according to -> -> pci spec, pci driver can get BAR address size by writing 0xffffffff -> -> to -> -> -> -> the pci register firstly, and then read back the value from this register. -> -> -> -OK however as you show below the BAR being sized is the BAR if a bridge. Are -> -you then adding a bridge device by hotplug? -No, I just simply hot plugged a VFIO device to Bus 0, another interesting -phenomenon is -If I hot plug the device to other bus, this doesn't happened. - -> -> -> -> We didn't handle this value specially while process pci write in -> -> qemu, the function call stack is: -> -> -> -> Pci_bridge_dev_write_config -> -> -> -> -> pci_bridge_write_config -> -> -> -> -> pci_default_write_config (we update the config[address] value here -> -> -> to -> -> fffffffffc800000, which should be 0xfc800000 ) -> -> -> -> -> pci_bridge_update_mappings -> -> -> -> ->pci_bridge_region_del(br, br->windows); -> -> -> -> -> pci_bridge_region_init -> -> -> -> -> -> -> pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong -> -> value -> -> fffffffffc800000) -> -> -> -> -> -> -> memory_region_transaction_commit -> -> -> -> -> -> -> -> So, as we can see, we use the wrong base address in qemu to update the -> -> memory regions, though, we update the base address to -> -> -> -> The correct value after pci driver in VM write the original value -> -> back, the virtio NIC in bus 4 may still sends net packets concurrently -> -> with -> -> -> -> The wrong memory region address. -> -> -> -> -> -> -> -> We have tried to skip the memory region update action in qemu while -> -> detect pci write with 0xffffffff value, and it does work, but -> -> -> -> This seems to be not gently. -> -> -For sure. But I'm still puzzled as to why does Linux try to size the BAR of -> -the -> -bridge while a device behind it is used. -> -> -Can you pls post your QEMU command line? -My QEMU command line: -/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object -secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes - -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp -20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa -node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa -node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439 --no-user-config -nodefaults -chardev -socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait - -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet --global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device -pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none - -device -virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 - -drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev -tap,fd=35,id=hostnet0 -device -virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1 --chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 --device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on - -I am also very curious about this issue, in the linux kernel code, maybe double -check in function pci_bridge_check_ranges triggered this problem. - - -> -> -> -> -> -> -> -> -> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -> -> -> -> index b2e50c3..84b405d 100644 -> -> -> -> --- a/hw/pci/pci_bridge.c -> -> -> -> +++ b/hw/pci/pci_bridge.c -> -> -> -> @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, -> -> -> -> pci_default_write_config(d, address, val, len); -> -> -> -> - if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> -> -> + if ( (val != 0xffffffff) && -> -> -> -> + (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> -> -> /* io base/limit */ -> -> -> -> ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> -> -> @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, -> -> -> -> ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -> -> -> /* vga enable */ -> -> -> -> - ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> -> -> -> + ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { -> -> -> -> pci_bridge_update_mappings(s); -> -> -> -> } -> -> -> -> -> -> -> -> Thinks, -> -> -> -> Xu -> -> - -On Mon, Dec 10, 2018 at 03:12:53AM +0000, xuyandong wrote: -> -On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> > Hi all, -> -> > -> -> > -> -> > -> -> > In our test, we configured VM with several pci-bridges and a -> -> > virtio-net nic been attached with bus 4, -> -> > -> -> > After VM is startup, We ping this nic from host to judge if it is -> -> > working normally. Then, we hot add pci devices to this VM with bus 0. -> -> > -> -> > We found the virtio-net NIC in bus 4 is not working (can not connect) -> -> > occasionally, as it kick virtio backend failure with error below: -> -> > -> -> > Unassigned mem write 00000000fc803004 = 0x1 -> -> > -> -> > -> -> > -> -> > memory-region: pci_bridge_pci -> -> > -> -> > 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci -> -> > -> -> > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci -> -> > -> -> > 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> > virtio-pci-common -> -> > -> -> > 00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr -> -> > -> -> > 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> > virtio-pci-device -> -> > -> -> > 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> > virtio-pci-notify <- io mem unassigned -> -> > -> -> > ⦠-> -> > -> -> > -> -> > -> -> > We caught an exceptional address changing while this problem happened, -> -> > show as -> -> > follow: -> -> > -> -> > Before pci_bridge_update_mappingsï¼ -> -> > -> -> > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff -> -> > -> -> > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff -> -> > -> -> > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff -> -> > -> -> > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff -> -> > -> -> > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff -> -> > <- correct Adress Spce -> -> > -> -> > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff -> -> > -> -> > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff -> -> > -> -> > 00000000fce00000-00000000fcffffff (prio 1, RW): alias -> -> > pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff -> -> > -> -> > -> -> > -> -> > After pci_bridge_update_mappingsï¼ -> -> > -> -> > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff -> -> > -> -> > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff -> -> > -> -> > 00000000fde00000-00000000fdffffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff -> -> > -> -> > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff -> -> > -> -> > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff -> -> > -> -> > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff -> -> > -> -> > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff -> -> > -> -> > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias -> -> > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff -> -> > -> -> > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias -> -> > pci_bridge_pref_mem -> -> > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress -> -> Space -> -> -> -> This one is empty though right? -> -> -> -> > -> -> > -> -> > We have figured out why this address becomes this value, according to -> -> > pci spec, pci driver can get BAR address size by writing 0xffffffff -> -> > to -> -> > -> -> > the pci register firstly, and then read back the value from this register. -> -> -> -> -> -> OK however as you show below the BAR being sized is the BAR if a bridge. Are -> -> you then adding a bridge device by hotplug? -> -> -No, I just simply hot plugged a VFIO device to Bus 0, another interesting -> -phenomenon is -> -If I hot plug the device to other bus, this doesn't happened. -> -> -> -> -> -> -> > We didn't handle this value specially while process pci write in -> -> > qemu, the function call stack is: -> -> > -> -> > Pci_bridge_dev_write_config -> -> > -> -> > -> pci_bridge_write_config -> -> > -> -> > -> pci_default_write_config (we update the config[address] value here -> -> > -> to -> -> > fffffffffc800000, which should be 0xfc800000 ) -> -> > -> -> > -> pci_bridge_update_mappings -> -> > -> -> > ->pci_bridge_region_del(br, br->windows); -> -> > -> -> > -> pci_bridge_region_init -> -> > -> -> > -> -> -> > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong -> -> > value -> -> > fffffffffc800000) -> -> > -> -> > -> -> -> > memory_region_transaction_commit -> -> > -> -> > -> -> > -> -> > So, as we can see, we use the wrong base address in qemu to update the -> -> > memory regions, though, we update the base address to -> -> > -> -> > The correct value after pci driver in VM write the original value -> -> > back, the virtio NIC in bus 4 may still sends net packets concurrently -> -> > with -> -> > -> -> > The wrong memory region address. -> -> > -> -> > -> -> > -> -> > We have tried to skip the memory region update action in qemu while -> -> > detect pci write with 0xffffffff value, and it does work, but -> -> > -> -> > This seems to be not gently. -> -> -> -> For sure. But I'm still puzzled as to why does Linux try to size the BAR of -> -> the -> -> bridge while a device behind it is used. -> -> -> -> Can you pls post your QEMU command line? -> -> -My QEMU command line: -> -/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object -> -secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes -> --machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -> -host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -> -size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp -> -20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa -> -node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa -> -node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439 -> --no-user-config -nodefaults -chardev -> -socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait -> --mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet -> --global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device -> -pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -> -pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -> -pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -> -pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -> -pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -> -piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -> -usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -> -nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -> -virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -> -virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -> -virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -> -virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -> -virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -> -file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none -> --device -> -virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -> --drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -> -ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev -> -tap,fd=35,id=hostnet0 -device -> -virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1 -> --chardev pty,id=charserial0 -device -> -isa-serial,chardev=charserial0,id=serial0 -device -> -usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -> -cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -> -virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on -> -> -I am also very curious about this issue, in the linux kernel code, maybe -> -double check in function pci_bridge_check_ranges triggered this problem. -If you can get the stacktrace in Linux when it tries to write this -fffff value, that would be quite helpful. - - -> -> -> -> -> -> -> -> -> > -> -> > -> -> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -> -> > -> -> > index b2e50c3..84b405d 100644 -> -> > -> -> > --- a/hw/pci/pci_bridge.c -> -> > -> -> > +++ b/hw/pci/pci_bridge.c -> -> > -> -> > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, -> -> > -> -> > pci_default_write_config(d, address, val, len); -> -> > -> -> > - if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> > -> -> > + if ( (val != 0xffffffff) && -> -> > -> -> > + (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> > -> -> > /* io base/limit */ -> -> > -> -> > ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> > -> -> > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, -> -> > -> -> > ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -> > -> -> > /* vga enable */ -> -> > -> -> > - ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> -> > -> -> > + ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { -> -> > -> -> > pci_bridge_update_mappings(s); -> -> > -> -> > } -> -> > -> -> > -> -> > -> -> > Thinks, -> -> > -> -> > Xu -> -> > - -On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> > > Hi all, -> -> > > -> -> > > -> -> > > -> -> > > In our test, we configured VM with several pci-bridges and a -> -> > > virtio-net nic been attached with bus 4, -> -> > > -> -> > > After VM is startup, We ping this nic from host to judge if it is -> -> > > working normally. Then, we hot add pci devices to this VM with bus 0. -> -> > > -> -> > > We found the virtio-net NIC in bus 4 is not working (can not -> -> > > connect) occasionally, as it kick virtio backend failure with error -> -> > > below: -> -> > > -> -> > > Unassigned mem write 00000000fc803004 = 0x1 -> -> > > -> -> > > -> -> > > -> -> > > memory-region: pci_bridge_pci -> -> > > -> -> > > 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci -> -> > > -> -> > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci -> -> > > -> -> > > 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> > > virtio-pci-common -> -> > > -> -> > > 00000000fc801000-00000000fc801fff (prio 0, RW): -> -> > > virtio-pci-isr -> -> > > -> -> > > 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> > > virtio-pci-device -> -> > > -> -> > > 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> > > virtio-pci-notify <- io mem unassigned -> -> > > -> -> > > ⦠-> -> > > -> -> > > -> -> > > -> -> > > We caught an exceptional address changing while this problem -> -> > > happened, show as -> -> > > follow: -> -> > > -> -> > > Before pci_bridge_update_mappingsï¼ -> -> > > -> -> > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fc000000-00000000fc1fffff -> -> > > -> -> > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fc200000-00000000fc3fffff -> -> > > -> -> > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fc400000-00000000fc5fffff -> -> > > -> -> > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fc600000-00000000fc7fffff -> -> > > -> -> > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fc800000-00000000fc9fffff -> -> > > <- correct Adress Spce -> -> > > -> -> > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fca00000-00000000fcbfffff -> -> > > -> -> > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fcc00000-00000000fcdfffff -> -> > > -> -> > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias -> -> > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > 00000000fce00000-00000000fcffffff -> -> > > -> -> > > -> -> > > -> -> > > After pci_bridge_update_mappingsï¼ -> -> > > -> -> > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff -> -> > > -> -> > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff -> -> > > -> -> > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff -> -> > > -> -> > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff -> -> > > -> -> > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff -> -> > > -> -> > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff -> -> > > -> -> > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff -> -> > > -> -> > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias -> -> > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff -> -> > > -> -> > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias -> -pci_bridge_pref_mem -> -> > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional -> -> > > Adress -> -> > Space -> -> > -> -> > This one is empty though right? -> -> > -> -> > > -> -> > > -> -> > > We have figured out why this address becomes this value, -> -> > > according to pci spec, pci driver can get BAR address size by -> -> > > writing 0xffffffff to -> -> > > -> -> > > the pci register firstly, and then read back the value from this -> -> > > register. -> -> > -> -> > -> -> > OK however as you show below the BAR being sized is the BAR if a -> -> > bridge. Are you then adding a bridge device by hotplug? -> -> -> -> No, I just simply hot plugged a VFIO device to Bus 0, another -> -> interesting phenomenon is If I hot plug the device to other bus, this -> -> doesn't -> -happened. -> -> -> -> > -> -> > -> -> > > We didn't handle this value specially while process pci write in -> -> > > qemu, the function call stack is: -> -> > > -> -> > > Pci_bridge_dev_write_config -> -> > > -> -> > > -> pci_bridge_write_config -> -> > > -> -> > > -> pci_default_write_config (we update the config[address] value -> -> > > -> here to -> -> > > fffffffffc800000, which should be 0xfc800000 ) -> -> > > -> -> > > -> pci_bridge_update_mappings -> -> > > -> -> > > ->pci_bridge_region_del(br, br->windows); -> -> > > -> -> > > -> pci_bridge_region_init -> -> > > -> -> > > -> -> -> > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong -> -> > > value -> -> > > fffffffffc800000) -> -> > > -> -> > > -> -> -> > > memory_region_transaction_commit -> -> > > -> -> > > -> -> > > -> -> > > So, as we can see, we use the wrong base address in qemu to update -> -> > > the memory regions, though, we update the base address to -> -> > > -> -> > > The correct value after pci driver in VM write the original value -> -> > > back, the virtio NIC in bus 4 may still sends net packets -> -> > > concurrently with -> -> > > -> -> > > The wrong memory region address. -> -> > > -> -> > > -> -> > > -> -> > > We have tried to skip the memory region update action in qemu -> -> > > while detect pci write with 0xffffffff value, and it does work, -> -> > > but -> -> > > -> -> > > This seems to be not gently. -> -> > -> -> > For sure. But I'm still puzzled as to why does Linux try to size the -> -> > BAR of the bridge while a device behind it is used. -> -> > -> -> > Can you pls post your QEMU command line? -> -> -> -> My QEMU command line: -> -> /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -> -> -object -> -> secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194- -> -> Linux/master-key.aes -machine -> -> pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -> -> host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -> -> size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp -> -> 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -> -> -numa node,nodeid=1,cpus=5-9,mem=1024 -numa -> -> node,nodeid=2,cpus=10-14,mem=1024 -numa -> -> node,nodeid=3,cpus=15-19,mem=1024 -uuid -> -> 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults -> -> -chardev -> -> socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni -> -> tor.sock,server,nowait -mon -> -> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet -> -> -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -> -> -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -> -> pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -> -> pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -> -> pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -> -> pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -> -> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -> -> usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -> -> nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -> -> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -> -> virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -> -> virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -> -> virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -> -> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -> -> file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v -> -> irtio-disk0,cache=none -device -> -> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id -> -> =virtio-disk0,bootindex=1 -drive -> -> if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -> -> ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev -> -> tap,fd=35,id=hostnet0 -device -> -> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4 -> -> ,addr=0x1 -chardev pty,id=charserial0 -device -> -> isa-serial,chardev=charserial0,id=serial0 -device -> -> usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -> -> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -> -> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on -> -> -> -> I am also very curious about this issue, in the linux kernel code, maybe -> -> double -> -check in function pci_bridge_check_ranges triggered this problem. -> -> -If you can get the stacktrace in Linux when it tries to write this fffff -> -value, that -> -would be quite helpful. -> -After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon is -easier to reproduce, below is my modify in kernel: -diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c -index cb389277..86e232d 100644 ---- a/drivers/pci/setup-bus.c -+++ b/drivers/pci/setup-bus.c -@@ -27,7 +27,7 @@ - #include <linux/slab.h> - #include <linux/acpi.h> - #include "pci.h" -- -+#include <linux/delay.h> - unsigned int pci_flags; - - struct pci_dev_resource { -@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus) - pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, - 0xffffffff); - pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp); -+ mdelay(100); -+ printk(KERN_ERR "sleep\n"); -+ dump_stack(); - if (!tmp) - b_res[2].flags &= ~IORESOURCE_MEM_64; - pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, - -After hot plugging, we get the following log: - -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 0: assigned [mem -0xc2360000-0xc237ffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 3: assigned [mem -0xc2328000-0xc232bfff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:18 uefi-linux kernel: sleep -Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not -tainted 4.11.0-rc3+ #11 -Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + -PIIX, 1996), BIOS 0.0.0 02/06/2015 -Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn -Dec 11 09:28:18 uefi-linux kernel: Call Trace: -Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87 -Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 -Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50 -Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0 -Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 -Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 -Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 -Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 -Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 -Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 -Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 -Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410 -Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0 -Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140 -Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380 -Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90 -Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40 -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:18 uefi-linux kernel: sleep -Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not -tainted 4.11.0-rc3+ #11 -Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + -PIIX, 1996), BIOS 0.0.0 02/06/2015 -Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn -Dec 11 09:28:18 uefi-linux kernel: Call Trace: -Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87 -Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 -Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50 -Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0 -Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 -Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 -Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 -Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 -Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 -Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 -Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 -Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410 -Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0 -Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140 -Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380 -Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90 -Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40 -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:19 uefi-linux kernel: sleep -Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not -tainted 4.11.0-rc3+ #11 -Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + -PIIX, 1996), BIOS 0.0.0 02/06/2015 -Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn -Dec 11 09:28:19 uefi-linux kernel: Call Trace: -Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87 -Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 -Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50 -Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0 -Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 -Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 -Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 -Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 -Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 -Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 -Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 -Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410 -Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0 -Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140 -Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380 -Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90 -Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40 -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:19 uefi-linux kernel: sleep -Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not -tainted 4.11.0-rc3+ #11 -Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + -PIIX, 1996), BIOS 0.0.0 02/06/2015 -Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn -Dec 11 09:28:19 uefi-linux kernel: Call Trace: -Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87 -Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 -Dec 11 09:28:19 uefi-linux kernel: ? pci_conf1_read+0xba/0x100 -Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0xe9/0x960 -Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50 -Dec 11 09:28:19 uefi-linux kernel: ? pcibios_allocate_rom_resources+0x45/0x80 -Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0 -Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 -Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 -Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 -Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 -Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 -Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 -Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 -Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 -Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410 -Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0 -Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140 -Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380 -Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90 -Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40 -Dec 11 09:28:19 uefi-linux kernel: sleep -Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not -tainted 4.11.0-rc3+ #11 -Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + -PIIX, 1996), BIOS 0.0.0 02/06/2015 -Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn -Dec 11 09:28:19 uefi-linux kernel: Call Trace: -Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87 -Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 -Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50 -Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0 -Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 -Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 -Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 -Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 -Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 -Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 -Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 -Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 -Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410 -Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0 -Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140 -Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380 -Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90 -Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40 -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 lost sync at byte 1 -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at -isa0060/serio1/input0 - driver resynced. -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: bridge window [io -0xf000-0xffff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2800000-0xc29fffff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem -0xc2b00000-0xc2cfffff 64bit pref] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: bridge window [io -0xe000-0xefff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2600000-0xc27fffff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem -0xc2d00000-0xc2efffff 64bit pref] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io -0xd000-0xdfff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2400000-0xc25fffff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem -0xc2f00000-0xc30fffff 64bit pref] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io -0xc000-0xcfff] -Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem -0xc2000000-0xc21fffff] - -> -> -> -> > -> -> > -> -> > -> -> > > -> -> > > -> -> > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -> -> > > -> -> > > index b2e50c3..84b405d 100644 -> -> > > -> -> > > --- a/hw/pci/pci_bridge.c -> -> > > -> -> > > +++ b/hw/pci/pci_bridge.c -> -> > > -> -> > > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, -> -> > > -> -> > > pci_default_write_config(d, address, val, len); -> -> > > -> -> > > - if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> > > -> -> > > + if ( (val != 0xffffffff) && -> -> > > -> -> > > + (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> > > -> -> > > /* io base/limit */ -> -> > > -> -> > > ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> > > -> -> > > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, -> -> > > -> -> > > ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -> > > -> -> > > /* vga enable */ -> -> > > -> -> > > - ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> -> > > -> -> > > + ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { -> -> > > -> -> > > pci_bridge_update_mappings(s); -> -> > > -> -> > > } -> -> > > -> -> > > -> -> > > -> -> > > Thinks, -> -> > > -> -> > > Xu -> -> > > - -On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: -> -On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> > > > Hi all, -> -> > > > -> -> > > > -> -> > > > -> -> > > > In our test, we configured VM with several pci-bridges and a -> -> > > > virtio-net nic been attached with bus 4, -> -> > > > -> -> > > > After VM is startup, We ping this nic from host to judge if it is -> -> > > > working normally. Then, we hot add pci devices to this VM with bus 0. -> -> > > > -> -> > > > We found the virtio-net NIC in bus 4 is not working (can not -> -> > > > connect) occasionally, as it kick virtio backend failure with error -> -> > > > below: -> -> > > > -> -> > > > Unassigned mem write 00000000fc803004 = 0x1 -> -> > > > -> -> > > > -> -> > > > -> -> > > > memory-region: pci_bridge_pci -> -> > > > -> -> > > > 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci -> -> > > > -> -> > > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci -> -> > > > -> -> > > > 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> > > > virtio-pci-common -> -> > > > -> -> > > > 00000000fc801000-00000000fc801fff (prio 0, RW): -> -> > > > virtio-pci-isr -> -> > > > -> -> > > > 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> > > > virtio-pci-device -> -> > > > -> -> > > > 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> > > > virtio-pci-notify <- io mem unassigned -> -> > > > -> -> > > > ⦠-> -> > > > -> -> > > > -> -> > > > -> -> > > > We caught an exceptional address changing while this problem -> -> > > > happened, show as -> -> > > > follow: -> -> > > > -> -> > > > Before pci_bridge_update_mappingsï¼ -> -> > > > -> -> > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fc000000-00000000fc1fffff -> -> > > > -> -> > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fc200000-00000000fc3fffff -> -> > > > -> -> > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fc400000-00000000fc5fffff -> -> > > > -> -> > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fc600000-00000000fc7fffff -> -> > > > -> -> > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fc800000-00000000fc9fffff -> -> > > > <- correct Adress Spce -> -> > > > -> -> > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fca00000-00000000fcbfffff -> -> > > > -> -> > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fcc00000-00000000fcdfffff -> -> > > > -> -> > > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias -> -> > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > 00000000fce00000-00000000fcffffff -> -> > > > -> -> > > > -> -> > > > -> -> > > > After pci_bridge_update_mappingsï¼ -> -> > > > -> -> > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff -> -> > > > -> -> > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff -> -> > > > -> -> > > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff -> -> > > > -> -> > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff -> -> > > > -> -> > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff -> -> > > > -> -> > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff -> -> > > > -> -> > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff -> -> > > > -> -> > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias -> -> > > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff -> -> > > > -> -> > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias -> -> pci_bridge_pref_mem -> -> > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional -> -> > > > Adress -> -> > > Space -> -> > > -> -> > > This one is empty though right? -> -> > > -> -> > > > -> -> > > > -> -> > > > We have figured out why this address becomes this value, -> -> > > > according to pci spec, pci driver can get BAR address size by -> -> > > > writing 0xffffffff to -> -> > > > -> -> > > > the pci register firstly, and then read back the value from this -> -> > > > register. -> -> > > -> -> > > -> -> > > OK however as you show below the BAR being sized is the BAR if a -> -> > > bridge. Are you then adding a bridge device by hotplug? -> -> > -> -> > No, I just simply hot plugged a VFIO device to Bus 0, another -> -> > interesting phenomenon is If I hot plug the device to other bus, this -> -> > doesn't -> -> happened. -> -> > -> -> > > -> -> > > -> -> > > > We didn't handle this value specially while process pci write in -> -> > > > qemu, the function call stack is: -> -> > > > -> -> > > > Pci_bridge_dev_write_config -> -> > > > -> -> > > > -> pci_bridge_write_config -> -> > > > -> -> > > > -> pci_default_write_config (we update the config[address] value -> -> > > > -> here to -> -> > > > fffffffffc800000, which should be 0xfc800000 ) -> -> > > > -> -> > > > -> pci_bridge_update_mappings -> -> > > > -> -> > > > ->pci_bridge_region_del(br, br->windows); -> -> > > > -> -> > > > -> pci_bridge_region_init -> -> > > > -> -> > > > -> -> -> > > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong -> -> > > > value -> -> > > > fffffffffc800000) -> -> > > > -> -> > > > -> -> -> > > > memory_region_transaction_commit -> -> > > > -> -> > > > -> -> > > > -> -> > > > So, as we can see, we use the wrong base address in qemu to update -> -> > > > the memory regions, though, we update the base address to -> -> > > > -> -> > > > The correct value after pci driver in VM write the original value -> -> > > > back, the virtio NIC in bus 4 may still sends net packets -> -> > > > concurrently with -> -> > > > -> -> > > > The wrong memory region address. -> -> > > > -> -> > > > -> -> > > > -> -> > > > We have tried to skip the memory region update action in qemu -> -> > > > while detect pci write with 0xffffffff value, and it does work, -> -> > > > but -> -> > > > -> -> > > > This seems to be not gently. -> -> > > -> -> > > For sure. But I'm still puzzled as to why does Linux try to size the -> -> > > BAR of the bridge while a device behind it is used. -> -> > > -> -> > > Can you pls post your QEMU command line? -> -> > -> -> > My QEMU command line: -> -> > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -> -> > -object -> -> > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194- -> -> > Linux/master-key.aes -machine -> -> > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -> -> > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -> -> > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp -> -> > 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -> -> > -numa node,nodeid=1,cpus=5-9,mem=1024 -numa -> -> > node,nodeid=2,cpus=10-14,mem=1024 -numa -> -> > node,nodeid=3,cpus=15-19,mem=1024 -uuid -> -> > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults -> -> > -chardev -> -> > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni -> -> > tor.sock,server,nowait -mon -> -> > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet -> -> > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -> -> > -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -> -> > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -> -> > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -> -> > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -> -> > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -> -> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -> -> > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -> -> > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -> -> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -> -> > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -> -> > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -> -> > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -> -> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -> -> > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v -> -> > irtio-disk0,cache=none -device -> -> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id -> -> > =virtio-disk0,bootindex=1 -drive -> -> > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -> -> > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev -> -> > tap,fd=35,id=hostnet0 -device -> -> > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4 -> -> > ,addr=0x1 -chardev pty,id=charserial0 -device -> -> > isa-serial,chardev=charserial0,id=serial0 -device -> -> > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -> -> > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -> -> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on -> -> > -> -> > I am also very curious about this issue, in the linux kernel code, maybe -> -> > double -> -> check in function pci_bridge_check_ranges triggered this problem. -> -> -> -> If you can get the stacktrace in Linux when it tries to write this fffff -> -> value, that -> -> would be quite helpful. -> -> -> -> -After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon -> -is -> -easier to reproduce, below is my modify in kernel: -> -diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c -> -index cb389277..86e232d 100644 -> ---- a/drivers/pci/setup-bus.c -> -+++ b/drivers/pci/setup-bus.c -> -@@ -27,7 +27,7 @@ -> -#include <linux/slab.h> -> -#include <linux/acpi.h> -> -#include "pci.h" -> -- -> -+#include <linux/delay.h> -> -unsigned int pci_flags; -> -> -struct pci_dev_resource { -> -@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus) -> -pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -0xffffffff); -> -pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp); -> -+ mdelay(100); -> -+ printk(KERN_ERR "sleep\n"); -> -+ dump_stack(); -> -if (!tmp) -> -b_res[2].flags &= ~IORESOURCE_MEM_64; -> -pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -OK! -I just sent a Linux patch that should help. -I would appreciate it if you will give it a try -and if that helps reply to it with -a Tested-by: tag. - --- -MST - -On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: -> -> On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> > > > > Hi all, -> -> > > > > -> -> > > > > -> -> > > > > -> -> > > > > In our test, we configured VM with several pci-bridges and a -> -> > > > > virtio-net nic been attached with bus 4, -> -> > > > > -> -> > > > > After VM is startup, We ping this nic from host to judge if it -> -> > > > > is working normally. Then, we hot add pci devices to this VM with -> -> > > > > bus -> -0. -> -> > > > > -> -> > > > > We found the virtio-net NIC in bus 4 is not working (can not -> -> > > > > connect) occasionally, as it kick virtio backend failure with error -> -> > > > > below: -> -> > > > > -> -> > > > > Unassigned mem write 00000000fc803004 = 0x1 -> -> > > > > -> -> > > > > -> -> > > > > -> -> > > > > memory-region: pci_bridge_pci -> -> > > > > -> -> > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): -> -> > > > > pci_bridge_pci -> -> > > > > -> -> > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci -> -> > > > > -> -> > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> > > > > virtio-pci-common -> -> > > > > -> -> > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): -> -> > > > > virtio-pci-isr -> -> > > > > -> -> > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> > > > > virtio-pci-device -> -> > > > > -> -> > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> > > > > virtio-pci-notify <- io mem unassigned -> -> > > > > -> -> > > > > ⦠-> -> > > > > -> -> > > > > -> -> > > > > -> -> > > > > We caught an exceptional address changing while this problem -> -> > > > > happened, show as -> -> > > > > follow: -> -> > > > > -> -> > > > > Before pci_bridge_update_mappingsï¼ -> -> > > > > -> -> > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fc000000-00000000fc1fffff -> -> > > > > -> -> > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fc200000-00000000fc3fffff -> -> > > > > -> -> > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fc400000-00000000fc5fffff -> -> > > > > -> -> > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fc600000-00000000fc7fffff -> -> > > > > -> -> > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fc800000-00000000fc9fffff -> -> > > > > <- correct Adress Spce -> -> > > > > -> -> > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fca00000-00000000fcbfffff -> -> > > > > -> -> > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fcc00000-00000000fcdfffff -> -> > > > > -> -> > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias -> -> > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > 00000000fce00000-00000000fcffffff -> -> > > > > -> -> > > > > -> -> > > > > -> -> > > > > After pci_bridge_update_mappingsï¼ -> -> > > > > -> -> > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fda00000-00000000fdbfffff -> -> > > > > -> -> > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fdc00000-00000000fddfffff -> -> > > > > -> -> > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fde00000-00000000fdffffff -> -> > > > > -> -> > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fe000000-00000000fe1fffff -> -> > > > > -> -> > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fe200000-00000000fe3fffff -> -> > > > > -> -> > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fe400000-00000000fe5fffff -> -> > > > > -> -> > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fe600000-00000000fe7fffff -> -> > > > > -> -> > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias -> -> > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > 00000000fe800000-00000000fe9fffff -> -> > > > > -> -> > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias -> -> > pci_bridge_pref_mem -> -> > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional -> -Adress -> -> > > > Space -> -> > > > -> -> > > > This one is empty though right? -> -> > > > -> -> > > > > -> -> > > > > -> -> > > > > We have figured out why this address becomes this value, -> -> > > > > according to pci spec, pci driver can get BAR address size by -> -> > > > > writing 0xffffffff to -> -> > > > > -> -> > > > > the pci register firstly, and then read back the value from this -> -> > > > > register. -> -> > > > -> -> > > > -> -> > > > OK however as you show below the BAR being sized is the BAR if a -> -> > > > bridge. Are you then adding a bridge device by hotplug? -> -> > > -> -> > > No, I just simply hot plugged a VFIO device to Bus 0, another -> -> > > interesting phenomenon is If I hot plug the device to other bus, -> -> > > this doesn't -> -> > happened. -> -> > > -> -> > > > -> -> > > > -> -> > > > > We didn't handle this value specially while process pci write -> -> > > > > in qemu, the function call stack is: -> -> > > > > -> -> > > > > Pci_bridge_dev_write_config -> -> > > > > -> -> > > > > -> pci_bridge_write_config -> -> > > > > -> -> > > > > -> pci_default_write_config (we update the config[address] -> -> > > > > -> value here to -> -> > > > > fffffffffc800000, which should be 0xfc800000 ) -> -> > > > > -> -> > > > > -> pci_bridge_update_mappings -> -> > > > > -> -> > > > > ->pci_bridge_region_del(br, br->windows); -> -> > > > > -> -> > > > > -> pci_bridge_region_init -> -> > > > > -> -> > > > > -> -> > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the -> -> > > > > wrong value -> -> > > > > fffffffffc800000) -> -> > > > > -> -> > > > > -> -> -> > > > > memory_region_transaction_commit -> -> > > > > -> -> > > > > -> -> > > > > -> -> > > > > So, as we can see, we use the wrong base address in qemu to -> -> > > > > update the memory regions, though, we update the base address -> -> > > > > to -> -> > > > > -> -> > > > > The correct value after pci driver in VM write the original -> -> > > > > value back, the virtio NIC in bus 4 may still sends net -> -> > > > > packets concurrently with -> -> > > > > -> -> > > > > The wrong memory region address. -> -> > > > > -> -> > > > > -> -> > > > > -> -> > > > > We have tried to skip the memory region update action in qemu -> -> > > > > while detect pci write with 0xffffffff value, and it does -> -> > > > > work, but -> -> > > > > -> -> > > > > This seems to be not gently. -> -> > > > -> -> > > > For sure. But I'm still puzzled as to why does Linux try to size -> -> > > > the BAR of the bridge while a device behind it is used. -> -> > > > -> -> > > > Can you pls post your QEMU command line? -> -> > > -> -> > > My QEMU command line: -> -> > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -> -> > > -object -> -> > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain- -> -> > > 194- -> -> > > Linux/master-key.aes -machine -> -> > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -> -> > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -> -> > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp -> -> > > 20,sockets=20,cores=1,threads=1 -numa -> -> > > node,nodeid=0,cpus=0-4,mem=1024 -numa -> -> > > node,nodeid=1,cpus=5-9,mem=1024 -numa -> -> > > node,nodeid=2,cpus=10-14,mem=1024 -numa -> -> > > node,nodeid=3,cpus=15-19,mem=1024 -uuid -> -> > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults -> -> > > -chardev -> -> > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/ -> -> > > moni -> -> > > tor.sock,server,nowait -mon -> -> > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet -> -> > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot -> -> > > strict=on -device -> -> > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -> -> > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -> -> > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -> -> > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -> -> > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -> -> > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -> -> > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -> -> > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -> -> > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -> -> > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -> -> > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -> -> > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -> -> > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -> -> > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri -> -> > > ve-v -> -> > > irtio-disk0,cache=none -device -> -> > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk -> -> > > 0,id -> -> > > =virtio-disk0,bootindex=1 -drive -> -> > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -> -> > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev -> -> > > tap,fd=35,id=hostnet0 -device -> -> > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p -> -> > > ci.4 -> -> > > ,addr=0x1 -chardev pty,id=charserial0 -device -> -> > > isa-serial,chardev=charserial0,id=serial0 -device -> -> > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -> -> > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -> -> > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg -> -> > > timestamp=on -> -> > > -> -> > > I am also very curious about this issue, in the linux kernel code, -> -> > > maybe double -> -> > check in function pci_bridge_check_ranges triggered this problem. -> -> > -> -> > If you can get the stacktrace in Linux when it tries to write this -> -> > fffff value, that would be quite helpful. -> -> > -> -> -> -> After I add mdelay(100) in function pci_bridge_check_ranges, this -> -> phenomenon is easier to reproduce, below is my modify in kernel: -> -> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index -> -> cb389277..86e232d 100644 -> -> --- a/drivers/pci/setup-bus.c -> -> +++ b/drivers/pci/setup-bus.c -> -> @@ -27,7 +27,7 @@ -> -> #include <linux/slab.h> -> -> #include <linux/acpi.h> -> -> #include "pci.h" -> -> - -> -> +#include <linux/delay.h> -> -> unsigned int pci_flags; -> -> -> -> struct pci_dev_resource { -> -> @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus -> -*bus) -> -> pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> 0xffffffff); -> -> pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> &tmp); -> -> + mdelay(100); -> -> + printk(KERN_ERR "sleep\n"); -> -> + dump_stack(); -> -> if (!tmp) -> -> b_res[2].flags &= ~IORESOURCE_MEM_64; -> -> pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> -> -> -OK! -> -I just sent a Linux patch that should help. -> -I would appreciate it if you will give it a try and if that helps reply to it -> -with a -> -Tested-by: tag. -> -I tested this patch and it works fine on my machine. - -But I have another question, if we only fix this problem in the kernel, the -Linux -version that has been released does not work well on the virtualization -platform. -Is there a way to fix this problem in the backend? - -> --- -> -MST - -On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote: -> -On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: -> -> > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> > > > > > Hi all, -> -> > > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > In our test, we configured VM with several pci-bridges and a -> -> > > > > > virtio-net nic been attached with bus 4, -> -> > > > > > -> -> > > > > > After VM is startup, We ping this nic from host to judge if it -> -> > > > > > is working normally. Then, we hot add pci devices to this VM with -> -> > > > > > bus -> -> 0. -> -> > > > > > -> -> > > > > > We found the virtio-net NIC in bus 4 is not working (can not -> -> > > > > > connect) occasionally, as it kick virtio backend failure with -> -> > > > > > error below: -> -> > > > > > -> -> > > > > > Unassigned mem write 00000000fc803004 = 0x1 -> -> > > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > memory-region: pci_bridge_pci -> -> > > > > > -> -> > > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): -> -> > > > > > pci_bridge_pci -> -> > > > > > -> -> > > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci -> -> > > > > > -> -> > > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> > > > > > virtio-pci-common -> -> > > > > > -> -> > > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): -> -> > > > > > virtio-pci-isr -> -> > > > > > -> -> > > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> > > > > > virtio-pci-device -> -> > > > > > -> -> > > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> > > > > > virtio-pci-notify <- io mem unassigned -> -> > > > > > -> -> > > > > > ⦠-> -> > > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > We caught an exceptional address changing while this problem -> -> > > > > > happened, show as -> -> > > > > > follow: -> -> > > > > > -> -> > > > > > Before pci_bridge_update_mappingsï¼ -> -> > > > > > -> -> > > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fc000000-00000000fc1fffff -> -> > > > > > -> -> > > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fc200000-00000000fc3fffff -> -> > > > > > -> -> > > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fc400000-00000000fc5fffff -> -> > > > > > -> -> > > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fc600000-00000000fc7fffff -> -> > > > > > -> -> > > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fc800000-00000000fc9fffff -> -> > > > > > <- correct Adress Spce -> -> > > > > > -> -> > > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fca00000-00000000fcbfffff -> -> > > > > > -> -> > > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fcc00000-00000000fcdfffff -> -> > > > > > -> -> > > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias -> -> > > > > > pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > 00000000fce00000-00000000fcffffff -> -> > > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > After pci_bridge_update_mappingsï¼ -> -> > > > > > -> -> > > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fda00000-00000000fdbfffff -> -> > > > > > -> -> > > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fdc00000-00000000fddfffff -> -> > > > > > -> -> > > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fde00000-00000000fdffffff -> -> > > > > > -> -> > > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fe000000-00000000fe1fffff -> -> > > > > > -> -> > > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fe200000-00000000fe3fffff -> -> > > > > > -> -> > > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fe400000-00000000fe5fffff -> -> > > > > > -> -> > > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fe600000-00000000fe7fffff -> -> > > > > > -> -> > > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias -> -> > > > > > pci_bridge_mem @pci_bridge_pci -> -> > > > > > 00000000fe800000-00000000fe9fffff -> -> > > > > > -> -> > > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias -> -> > > pci_bridge_pref_mem -> -> > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional -> -> Adress -> -> > > > > Space -> -> > > > > -> -> > > > > This one is empty though right? -> -> > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > We have figured out why this address becomes this value, -> -> > > > > > according to pci spec, pci driver can get BAR address size by -> -> > > > > > writing 0xffffffff to -> -> > > > > > -> -> > > > > > the pci register firstly, and then read back the value from this -> -> > > > > > register. -> -> > > > > -> -> > > > > -> -> > > > > OK however as you show below the BAR being sized is the BAR if a -> -> > > > > bridge. Are you then adding a bridge device by hotplug? -> -> > > > -> -> > > > No, I just simply hot plugged a VFIO device to Bus 0, another -> -> > > > interesting phenomenon is If I hot plug the device to other bus, -> -> > > > this doesn't -> -> > > happened. -> -> > > > -> -> > > > > -> -> > > > > -> -> > > > > > We didn't handle this value specially while process pci write -> -> > > > > > in qemu, the function call stack is: -> -> > > > > > -> -> > > > > > Pci_bridge_dev_write_config -> -> > > > > > -> -> > > > > > -> pci_bridge_write_config -> -> > > > > > -> -> > > > > > -> pci_default_write_config (we update the config[address] -> -> > > > > > -> value here to -> -> > > > > > fffffffffc800000, which should be 0xfc800000 ) -> -> > > > > > -> -> > > > > > -> pci_bridge_update_mappings -> -> > > > > > -> -> > > > > > ->pci_bridge_region_del(br, br->windows); -> -> > > > > > -> -> > > > > > -> pci_bridge_region_init -> -> > > > > > -> -> > > > > > -> -> > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the -> -> > > > > > wrong value -> -> > > > > > fffffffffc800000) -> -> > > > > > -> -> > > > > > -> -> -> > > > > > memory_region_transaction_commit -> -> > > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > So, as we can see, we use the wrong base address in qemu to -> -> > > > > > update the memory regions, though, we update the base address -> -> > > > > > to -> -> > > > > > -> -> > > > > > The correct value after pci driver in VM write the original -> -> > > > > > value back, the virtio NIC in bus 4 may still sends net -> -> > > > > > packets concurrently with -> -> > > > > > -> -> > > > > > The wrong memory region address. -> -> > > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > We have tried to skip the memory region update action in qemu -> -> > > > > > while detect pci write with 0xffffffff value, and it does -> -> > > > > > work, but -> -> > > > > > -> -> > > > > > This seems to be not gently. -> -> > > > > -> -> > > > > For sure. But I'm still puzzled as to why does Linux try to size -> -> > > > > the BAR of the bridge while a device behind it is used. -> -> > > > > -> -> > > > > Can you pls post your QEMU command line? -> -> > > > -> -> > > > My QEMU command line: -> -> > > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -> -> > > > -object -> -> > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain- -> -> > > > 194- -> -> > > > Linux/master-key.aes -machine -> -> > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -> -> > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -> -> > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp -> -> > > > 20,sockets=20,cores=1,threads=1 -numa -> -> > > > node,nodeid=0,cpus=0-4,mem=1024 -numa -> -> > > > node,nodeid=1,cpus=5-9,mem=1024 -numa -> -> > > > node,nodeid=2,cpus=10-14,mem=1024 -numa -> -> > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid -> -> > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults -> -> > > > -chardev -> -> > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/ -> -> > > > moni -> -> > > > tor.sock,server,nowait -mon -> -> > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet -> -> > > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot -> -> > > > strict=on -device -> -> > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -> -> > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -> -> > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -> -> > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -> -> > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -> -> > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -> -> > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -> -> > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -> -> > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -> -> > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -> -> > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -> -> > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -> -> > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -> -> > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri -> -> > > > ve-v -> -> > > > irtio-disk0,cache=none -device -> -> > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk -> -> > > > 0,id -> -> > > > =virtio-disk0,bootindex=1 -drive -> -> > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -> -> > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev -> -> > > > tap,fd=35,id=hostnet0 -device -> -> > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p -> -> > > > ci.4 -> -> > > > ,addr=0x1 -chardev pty,id=charserial0 -device -> -> > > > isa-serial,chardev=charserial0,id=serial0 -device -> -> > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -> -> > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -> -> > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg -> -> > > > timestamp=on -> -> > > > -> -> > > > I am also very curious about this issue, in the linux kernel code, -> -> > > > maybe double -> -> > > check in function pci_bridge_check_ranges triggered this problem. -> -> > > -> -> > > If you can get the stacktrace in Linux when it tries to write this -> -> > > fffff value, that would be quite helpful. -> -> > > -> -> > -> -> > After I add mdelay(100) in function pci_bridge_check_ranges, this -> -> > phenomenon is easier to reproduce, below is my modify in kernel: -> -> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index -> -> > cb389277..86e232d 100644 -> -> > --- a/drivers/pci/setup-bus.c -> -> > +++ b/drivers/pci/setup-bus.c -> -> > @@ -27,7 +27,7 @@ -> -> > #include <linux/slab.h> -> -> > #include <linux/acpi.h> -> -> > #include "pci.h" -> -> > - -> -> > +#include <linux/delay.h> -> -> > unsigned int pci_flags; -> -> > -> -> > struct pci_dev_resource { -> -> > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus -> -> *bus) -> -> > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> > 0xffffffff); -> -> > pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> > &tmp); -> -> > + mdelay(100); -> -> > + printk(KERN_ERR "sleep\n"); -> -> > + dump_stack(); -> -> > if (!tmp) -> -> > b_res[2].flags &= ~IORESOURCE_MEM_64; -> -> > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> > -> -> -> -> OK! -> -> I just sent a Linux patch that should help. -> -> I would appreciate it if you will give it a try and if that helps reply to -> -> it with a -> -> Tested-by: tag. -> -> -> -> -I tested this patch and it works fine on my machine. -> -> -But I have another question, if we only fix this problem in the kernel, the -> -Linux -> -version that has been released does not work well on the virtualization -> -platform. -> -Is there a way to fix this problem in the backend? -There could we a way to work around this. -Does below help? - -diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c -index 236a20eaa8..7834cac4b0 100644 ---- a/hw/i386/acpi-build.c -+++ b/hw/i386/acpi-build.c -@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, -PCIBus *bus, - - aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); - aml_append(method, -- aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check */) -+ aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device Check -Light */) - ); - aml_append(method, - aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request */) - -> -On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote: -> -> On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: -> -> > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> > > > > > > Hi all, -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > In our test, we configured VM with several pci-bridges and -> -> > > > > > > a virtio-net nic been attached with bus 4, -> -> > > > > > > -> -> > > > > > > After VM is startup, We ping this nic from host to judge -> -> > > > > > > if it is working normally. Then, we hot add pci devices to -> -> > > > > > > this VM with bus -> -> > 0. -> -> > > > > > > -> -> > > > > > > We found the virtio-net NIC in bus 4 is not working (can -> -> > > > > > > not -> -> > > > > > > connect) occasionally, as it kick virtio backend failure with -> -> > > > > > > error -> -below: -> -> > > > > > > -> -> > > > > > > Unassigned mem write 00000000fc803004 = 0x1 -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > memory-region: pci_bridge_pci -> -> > > > > > > -> -> > > > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): -> -> > > > > > > pci_bridge_pci -> -> > > > > > > -> -> > > > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): -> -> > > > > > > virtio-pci -> -> > > > > > > -> -> > > > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> > > > > > > virtio-pci-common -> -> > > > > > > -> -> > > > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): -> -> > > > > > > virtio-pci-isr -> -> > > > > > > -> -> > > > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> > > > > > > virtio-pci-device -> -> > > > > > > -> -> > > > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> > > > > > > virtio-pci-notify <- io mem unassigned -> -> > > > > > > -> -> > > > > > > ⦠-> -> > > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > We caught an exceptional address changing while this -> -> > > > > > > problem happened, show as -> -> > > > > > > follow: -> -> > > > > > > -> -> > > > > > > Before pci_bridge_update_mappingsï¼ -> -> > > > > > > -> -> > > > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fc000000-00000000fc1fffff -> -> > > > > > > -> -> > > > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fc200000-00000000fc3fffff -> -> > > > > > > -> -> > > > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fc400000-00000000fc5fffff -> -> > > > > > > -> -> > > > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fc600000-00000000fc7fffff -> -> > > > > > > -> -> > > > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fc800000-00000000fc9fffff -> -> > > > > > > <- correct Adress Spce -> -> > > > > > > -> -> > > > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fca00000-00000000fcbfffff -> -> > > > > > > -> -> > > > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fcc00000-00000000fcdfffff -> -> > > > > > > -> -> > > > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > 00000000fce00000-00000000fcffffff -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > After pci_bridge_update_mappingsï¼ -> -> > > > > > > -> -> > > > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fda00000-00000000fdbfffff -> -> > > > > > > -> -> > > > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fdc00000-00000000fddfffff -> -> > > > > > > -> -> > > > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fde00000-00000000fdffffff -> -> > > > > > > -> -> > > > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fe000000-00000000fe1fffff -> -> > > > > > > -> -> > > > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fe200000-00000000fe3fffff -> -> > > > > > > -> -> > > > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fe400000-00000000fe5fffff -> -> > > > > > > -> -> > > > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fe600000-00000000fe7fffff -> -> > > > > > > -> -> > > > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): -> -> > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > 00000000fe800000-00000000fe9fffff -> -> > > > > > > -> -> > > > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): -> -> > > > > > > alias -> -> > > > pci_bridge_pref_mem -> -> > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- -> -> > > > > > > Exceptional -> -> > Adress -> -> > > > > > Space -> -> > > > > > -> -> > > > > > This one is empty though right? -> -> > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > We have figured out why this address becomes this value, -> -> > > > > > > according to pci spec, pci driver can get BAR address -> -> > > > > > > size by writing 0xffffffff to -> -> > > > > > > -> -> > > > > > > the pci register firstly, and then read back the value from this -> -register. -> -> > > > > > -> -> > > > > > -> -> > > > > > OK however as you show below the BAR being sized is the BAR -> -> > > > > > if a bridge. Are you then adding a bridge device by hotplug? -> -> > > > > -> -> > > > > No, I just simply hot plugged a VFIO device to Bus 0, another -> -> > > > > interesting phenomenon is If I hot plug the device to other -> -> > > > > bus, this doesn't -> -> > > > happened. -> -> > > > > -> -> > > > > > -> -> > > > > > -> -> > > > > > > We didn't handle this value specially while process pci -> -> > > > > > > write in qemu, the function call stack is: -> -> > > > > > > -> -> > > > > > > Pci_bridge_dev_write_config -> -> > > > > > > -> -> > > > > > > -> pci_bridge_write_config -> -> > > > > > > -> -> > > > > > > -> pci_default_write_config (we update the config[address] -> -> > > > > > > -> value here to -> -> > > > > > > fffffffffc800000, which should be 0xfc800000 ) -> -> > > > > > > -> -> > > > > > > -> pci_bridge_update_mappings -> -> > > > > > > -> -> > > > > > > ->pci_bridge_region_del(br, br->windows); -> -> > > > > > > -> -> > > > > > > -> pci_bridge_region_init -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use -> -> > > > > > > -> the -> -> > > > > > > wrong value -> -> > > > > > > fffffffffc800000) -> -> > > > > > > -> -> > > > > > > -> -> -> > > > > > > memory_region_transaction_commit -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > So, as we can see, we use the wrong base address in qemu -> -> > > > > > > to update the memory regions, though, we update the base -> -> > > > > > > address to -> -> > > > > > > -> -> > > > > > > The correct value after pci driver in VM write the -> -> > > > > > > original value back, the virtio NIC in bus 4 may still -> -> > > > > > > sends net packets concurrently with -> -> > > > > > > -> -> > > > > > > The wrong memory region address. -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > We have tried to skip the memory region update action in -> -> > > > > > > qemu while detect pci write with 0xffffffff value, and it -> -> > > > > > > does work, but -> -> > > > > > > -> -> > > > > > > This seems to be not gently. -> -> > > > > > -> -> > > > > > For sure. But I'm still puzzled as to why does Linux try to -> -> > > > > > size the BAR of the bridge while a device behind it is used. -> -> > > > > > -> -> > > > > > Can you pls post your QEMU command line? -> -> > > > > -> -> > > > > My QEMU command line: -> -> > > > > /root/xyd/qemu-system-x86_64 -name -> -> > > > > guest=Linux,debug-threads=on -S -object -> -> > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom -> -> > > > > ain- -> -> > > > > 194- -> -> > > > > Linux/master-key.aes -machine -> -> > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -> -> > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -> -> > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -> -> > > > > -smp -> -> > > > > 20,sockets=20,cores=1,threads=1 -numa -> -> > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa -> -> > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa -> -> > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa -> -> > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid -> -> > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -> -> > > > > -nodefaults -chardev -> -> > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li -> -> > > > > nux/ -> -> > > > > moni -> -> > > > > tor.sock,server,nowait -mon -> -> > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -> -> > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown -> -> > > > > -boot strict=on -device -> -> > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -> -> > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -> -> > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -> -> > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -> -> > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -> -> > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -> -> > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -> -> > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -> -> > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -> -> > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -> -> > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -> -> > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -> -> > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -> -> > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id -> -> > > > > =dri -> -> > > > > ve-v -> -> > > > > irtio-disk0,cache=none -device -> -> > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio- -> -> > > > > disk -> -> > > > > 0,id -> -> > > > > =virtio-disk0,bootindex=1 -drive -> -> > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -> -> > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -> -> > > > > -netdev -> -> > > > > tap,fd=35,id=hostnet0 -device -> -> > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b -> -> > > > > us=p -> -> > > > > ci.4 -> -> > > > > ,addr=0x1 -chardev pty,id=charserial0 -device -> -> > > > > isa-serial,chardev=charserial0,id=serial0 -device -> -> > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -> -> > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -> -> > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg -> -> > > > > timestamp=on -> -> > > > > -> -> > > > > I am also very curious about this issue, in the linux kernel -> -> > > > > code, maybe double -> -> > > > check in function pci_bridge_check_ranges triggered this problem. -> -> > > > -> -> > > > If you can get the stacktrace in Linux when it tries to write -> -> > > > this fffff value, that would be quite helpful. -> -> > > > -> -> > > -> -> > > After I add mdelay(100) in function pci_bridge_check_ranges, this -> -> > > phenomenon is easier to reproduce, below is my modify in kernel: -> -> > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c -> -> > > index cb389277..86e232d 100644 -> -> > > --- a/drivers/pci/setup-bus.c -> -> > > +++ b/drivers/pci/setup-bus.c -> -> > > @@ -27,7 +27,7 @@ -> -> > > #include <linux/slab.h> -> -> > > #include <linux/acpi.h> -> -> > > #include "pci.h" -> -> > > - -> -> > > +#include <linux/delay.h> -> -> > > unsigned int pci_flags; -> -> > > -> -> > > struct pci_dev_resource { -> -> > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct -> -> > > pci_bus -> -> > *bus) -> -> > > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> > > 0xffffffff); -> -> > > pci_read_config_dword(bridge, -> -> > > PCI_PREF_BASE_UPPER32, &tmp); -> -> > > + mdelay(100); -> -> > > + printk(KERN_ERR "sleep\n"); -> -> > > + dump_stack(); -> -> > > if (!tmp) -> -> > > b_res[2].flags &= ~IORESOURCE_MEM_64; -> -> > > pci_write_config_dword(bridge, -> -> > > PCI_PREF_BASE_UPPER32, -> -> > > -> -> > -> -> > OK! -> -> > I just sent a Linux patch that should help. -> -> > I would appreciate it if you will give it a try and if that helps -> -> > reply to it with a -> -> > Tested-by: tag. -> -> > -> -> -> -> I tested this patch and it works fine on my machine. -> -> -> -> But I have another question, if we only fix this problem in the -> -> kernel, the Linux version that has been released does not work well on the -> -virtualization platform. -> -> Is there a way to fix this problem in the backend? -> -> -There could we a way to work around this. -> -Does below help? -I am sorry to tell you, I tested this patch and it doesn't work fine. - -> -> -diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index -> -236a20eaa8..7834cac4b0 100644 -> ---- a/hw/i386/acpi-build.c -> -+++ b/hw/i386/acpi-build.c -> -@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml -> -*parent_scope, PCIBus *bus, -> -> -aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); -> -aml_append(method, -> -- aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check -> -*/) -> -+ aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device -> -+ Check Light */) -> -); -> -aml_append(method, -> -aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request -> -*/) - -On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote: -> -> There could we a way to work around this. -> -> Does below help? -> -> -I am sorry to tell you, I tested this patch and it doesn't work fine. -What happens? - -> -> -> -> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index -> -> 236a20eaa8..7834cac4b0 100644 -> -> --- a/hw/i386/acpi-build.c -> -> +++ b/hw/i386/acpi-build.c -> -> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml -> -> *parent_scope, PCIBus *bus, -> -> -> -> aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); -> -> aml_append(method, -> -> - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check -> -> */) -> -> + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device -> -> + Check Light */) -> -> ); -> -> aml_append(method, -> -> aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request -> -> */) - -On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote: -> -> On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote: -> -> > On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: -> -> > > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: -> -> > > > > > > > Hi all, -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > In our test, we configured VM with several pci-bridges and -> -> > > > > > > > a virtio-net nic been attached with bus 4, -> -> > > > > > > > -> -> > > > > > > > After VM is startup, We ping this nic from host to judge -> -> > > > > > > > if it is working normally. Then, we hot add pci devices to -> -> > > > > > > > this VM with bus -> -> > > 0. -> -> > > > > > > > -> -> > > > > > > > We found the virtio-net NIC in bus 4 is not working (can -> -> > > > > > > > not -> -> > > > > > > > connect) occasionally, as it kick virtio backend failure with -> -> > > > > > > > error -> -> below: -> -> > > > > > > > -> -> > > > > > > > Unassigned mem write 00000000fc803004 = 0x1 -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > memory-region: pci_bridge_pci -> -> > > > > > > > -> -> > > > > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): -> -> > > > > > > > pci_bridge_pci -> -> > > > > > > > -> -> > > > > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): -> -> > > > > > > > virtio-pci -> -> > > > > > > > -> -> > > > > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): -> -> > > > > > > > virtio-pci-common -> -> > > > > > > > -> -> > > > > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): -> -> > > > > > > > virtio-pci-isr -> -> > > > > > > > -> -> > > > > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): -> -> > > > > > > > virtio-pci-device -> -> > > > > > > > -> -> > > > > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): -> -> > > > > > > > virtio-pci-notify <- io mem unassigned -> -> > > > > > > > -> -> > > > > > > > ⦠-> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > We caught an exceptional address changing while this -> -> > > > > > > > problem happened, show as -> -> > > > > > > > follow: -> -> > > > > > > > -> -> > > > > > > > Before pci_bridge_update_mappingsï¼ -> -> > > > > > > > -> -> > > > > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fc000000-00000000fc1fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fc200000-00000000fc3fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fc400000-00000000fc5fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fc600000-00000000fc7fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fc800000-00000000fc9fffff -> -> > > > > > > > <- correct Adress Spce -> -> > > > > > > > -> -> > > > > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fca00000-00000000fcbfffff -> -> > > > > > > > -> -> > > > > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fcc00000-00000000fcdfffff -> -> > > > > > > > -> -> > > > > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci -> -> > > > > > > > 00000000fce00000-00000000fcffffff -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > After pci_bridge_update_mappingsï¼ -> -> > > > > > > > -> -> > > > > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fda00000-00000000fdbfffff -> -> > > > > > > > -> -> > > > > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fdc00000-00000000fddfffff -> -> > > > > > > > -> -> > > > > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fde00000-00000000fdffffff -> -> > > > > > > > -> -> > > > > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fe000000-00000000fe1fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fe200000-00000000fe3fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fe400000-00000000fe5fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fe600000-00000000fe7fffff -> -> > > > > > > > -> -> > > > > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): -> -> > > > > > > > alias pci_bridge_mem @pci_bridge_pci -> -> > > > > > > > 00000000fe800000-00000000fe9fffff -> -> > > > > > > > -> -> > > > > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): -> -> > > > > > > > alias -> -> > > > > pci_bridge_pref_mem -> -> > > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- -> -> > > > > > > > Exceptional -> -> > > Adress -> -> > > > > > > Space -> -> > > > > > > -> -> > > > > > > This one is empty though right? -> -> > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > We have figured out why this address becomes this value, -> -> > > > > > > > according to pci spec, pci driver can get BAR address -> -> > > > > > > > size by writing 0xffffffff to -> -> > > > > > > > -> -> > > > > > > > the pci register firstly, and then read back the value from -> -> > > > > > > > this -> -> register. -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > OK however as you show below the BAR being sized is the BAR -> -> > > > > > > if a bridge. Are you then adding a bridge device by hotplug? -> -> > > > > > -> -> > > > > > No, I just simply hot plugged a VFIO device to Bus 0, another -> -> > > > > > interesting phenomenon is If I hot plug the device to other -> -> > > > > > bus, this doesn't -> -> > > > > happened. -> -> > > > > > -> -> > > > > > > -> -> > > > > > > -> -> > > > > > > > We didn't handle this value specially while process pci -> -> > > > > > > > write in qemu, the function call stack is: -> -> > > > > > > > -> -> > > > > > > > Pci_bridge_dev_write_config -> -> > > > > > > > -> -> > > > > > > > -> pci_bridge_write_config -> -> > > > > > > > -> -> > > > > > > > -> pci_default_write_config (we update the config[address] -> -> > > > > > > > -> value here to -> -> > > > > > > > fffffffffc800000, which should be 0xfc800000 ) -> -> > > > > > > > -> -> > > > > > > > -> pci_bridge_update_mappings -> -> > > > > > > > -> -> > > > > > > > ->pci_bridge_region_del(br, br->windows); -> -> > > > > > > > -> -> > > > > > > > -> pci_bridge_region_init -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use -> -> > > > > > > > -> the -> -> > > > > > > > wrong value -> -> > > > > > > > fffffffffc800000) -> -> > > > > > > > -> -> > > > > > > > -> -> -> > > > > > > > memory_region_transaction_commit -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > So, as we can see, we use the wrong base address in qemu -> -> > > > > > > > to update the memory regions, though, we update the base -> -> > > > > > > > address to -> -> > > > > > > > -> -> > > > > > > > The correct value after pci driver in VM write the -> -> > > > > > > > original value back, the virtio NIC in bus 4 may still -> -> > > > > > > > sends net packets concurrently with -> -> > > > > > > > -> -> > > > > > > > The wrong memory region address. -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > -> -> > > > > > > > We have tried to skip the memory region update action in -> -> > > > > > > > qemu while detect pci write with 0xffffffff value, and it -> -> > > > > > > > does work, but -> -> > > > > > > > -> -> > > > > > > > This seems to be not gently. -> -> > > > > > > -> -> > > > > > > For sure. But I'm still puzzled as to why does Linux try to -> -> > > > > > > size the BAR of the bridge while a device behind it is used. -> -> > > > > > > -> -> > > > > > > Can you pls post your QEMU command line? -> -> > > > > > -> -> > > > > > My QEMU command line: -> -> > > > > > /root/xyd/qemu-system-x86_64 -name -> -> > > > > > guest=Linux,debug-threads=on -S -object -> -> > > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom -> -> > > > > > ain- -> -> > > > > > 194- -> -> > > > > > Linux/master-key.aes -machine -> -> > > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu -> -> > > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m -> -> > > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -> -> > > > > > -smp -> -> > > > > > 20,sockets=20,cores=1,threads=1 -numa -> -> > > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa -> -> > > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa -> -> > > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa -> -> > > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid -> -> > > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -> -> > > > > > -nodefaults -chardev -> -> > > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li -> -> > > > > > nux/ -> -> > > > > > moni -> -> > > > > > tor.sock,server,nowait -mon -> -> > > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -> -> > > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown -> -> > > > > > -boot strict=on -device -> -> > > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device -> -> > > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device -> -> > > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device -> -> > > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device -> -> > > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device -> -> > > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device -> -> > > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device -> -> > > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device -> -> > > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device -> -> > > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device -> -> > > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device -> -> > > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device -> -> > > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive -> -> > > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id -> -> > > > > > =dri -> -> > > > > > ve-v -> -> > > > > > irtio-disk0,cache=none -device -> -> > > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio- -> -> > > > > > disk -> -> > > > > > 0,id -> -> > > > > > =virtio-disk0,bootindex=1 -drive -> -> > > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device -> -> > > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -> -> > > > > > -netdev -> -> > > > > > tap,fd=35,id=hostnet0 -device -> -> > > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b -> -> > > > > > us=p -> -> > > > > > ci.4 -> -> > > > > > ,addr=0x1 -chardev pty,id=charserial0 -device -> -> > > > > > isa-serial,chardev=charserial0,id=serial0 -device -> -> > > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device -> -> > > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device -> -> > > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg -> -> > > > > > timestamp=on -> -> > > > > > -> -> > > > > > I am also very curious about this issue, in the linux kernel -> -> > > > > > code, maybe double -> -> > > > > check in function pci_bridge_check_ranges triggered this problem. -> -> > > > > -> -> > > > > If you can get the stacktrace in Linux when it tries to write -> -> > > > > this fffff value, that would be quite helpful. -> -> > > > > -> -> > > > -> -> > > > After I add mdelay(100) in function pci_bridge_check_ranges, this -> -> > > > phenomenon is easier to reproduce, below is my modify in kernel: -> -> > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c -> -> > > > index cb389277..86e232d 100644 -> -> > > > --- a/drivers/pci/setup-bus.c -> -> > > > +++ b/drivers/pci/setup-bus.c -> -> > > > @@ -27,7 +27,7 @@ -> -> > > > #include <linux/slab.h> -> -> > > > #include <linux/acpi.h> -> -> > > > #include "pci.h" -> -> > > > - -> -> > > > +#include <linux/delay.h> -> -> > > > unsigned int pci_flags; -> -> > > > -> -> > > > struct pci_dev_resource { -> -> > > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct -> -> > > > pci_bus -> -> > > *bus) -> -> > > > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, -> -> > > > 0xffffffff); -> -> > > > pci_read_config_dword(bridge, -> -> > > > PCI_PREF_BASE_UPPER32, &tmp); -> -> > > > + mdelay(100); -> -> > > > + printk(KERN_ERR "sleep\n"); -> -> > > > + dump_stack(); -> -> > > > if (!tmp) -> -> > > > b_res[2].flags &= ~IORESOURCE_MEM_64; -> -> > > > pci_write_config_dword(bridge, -> -> > > > PCI_PREF_BASE_UPPER32, -> -> > > > -> -> > > -> -> > > OK! -> -> > > I just sent a Linux patch that should help. -> -> > > I would appreciate it if you will give it a try and if that helps -> -> > > reply to it with a -> -> > > Tested-by: tag. -> -> > > -> -> > -> -> > I tested this patch and it works fine on my machine. -> -> > -> -> > But I have another question, if we only fix this problem in the -> -> > kernel, the Linux version that has been released does not work well on the -> -> virtualization platform. -> -> > Is there a way to fix this problem in the backend? -> -> -> -> There could we a way to work around this. -> -> Does below help? -> -> -I am sorry to tell you, I tested this patch and it doesn't work fine. -> -> -> -> -> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index -> -> 236a20eaa8..7834cac4b0 100644 -> -> --- a/hw/i386/acpi-build.c -> -> +++ b/hw/i386/acpi-build.c -> -> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml -> -> *parent_scope, PCIBus *bus, -> -> -> -> aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); -> -> aml_append(method, -> -> - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check -> -> */) -> -> + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device -> -> + Check Light */) -> -> ); -> -> aml_append(method, -> -> aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request -> -> */) -Oh I see, another bug: - - case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: - acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT -event\n"); - /* TBD: Exactly what does 'light' mean? */ - break; - -And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type) -and friends all just ignore this event type. - - - --- -MST - -> -> > > > > > > > > Hi all, -> -> > > > > > > > > -> -> > > > > > > > > -> -> > > > > > > > > -> -> > > > > > > > > In our test, we configured VM with several pci-bridges -> -> > > > > > > > > and a virtio-net nic been attached with bus 4, -> -> > > > > > > > > -> -> > > > > > > > > After VM is startup, We ping this nic from host to -> -> > > > > > > > > judge if it is working normally. Then, we hot add pci -> -> > > > > > > > > devices to this VM with bus -> -> > > > 0. -> -> > > > > > > > > -> -> > > > > > > > > We found the virtio-net NIC in bus 4 is not working -> -> > > > > > > > > (can not -> -> > > > > > > > > connect) occasionally, as it kick virtio backend -> -> > > > > > > > > failure with error -> -> > > But I have another question, if we only fix this problem in the -> -> > > kernel, the Linux version that has been released does not work -> -> > > well on the -> -> > virtualization platform. -> -> > > Is there a way to fix this problem in the backend? -> -> > -> -> > There could we a way to work around this. -> -> > Does below help? -> -> -> -> I am sorry to tell you, I tested this patch and it doesn't work fine. -> -> -> -> > -> -> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index -> -> > 236a20eaa8..7834cac4b0 100644 -> -> > --- a/hw/i386/acpi-build.c -> -> > +++ b/hw/i386/acpi-build.c -> -> > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml -> -> > *parent_scope, PCIBus *bus, -> -> > -> -> > aml_append(method, aml_store(aml_int(bsel_val), -> -aml_name("BNUM"))); -> -> > aml_append(method, -> -> > - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device -> -> > Check -> -*/) -> -> > + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* -> -> > + Device Check Light */) -> -> > ); -> -> > aml_append(method, -> -> > aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject -> -> > Request */) -> -> -> -Oh I see, another bug: -> -> -case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: -> -acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT -> -event\n"); -> -/* TBD: Exactly what does 'light' mean? */ -> -break; -> -> -And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type) -> -and friends all just ignore this event type. -> -> -> -> --- -> -MST -Hi Michael, - -If we want to fix this problem on the backend, it is not enough to consider -only PCI -device hot plugging, because I found that if we use a command like -"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to reproduce. - -From the perspective of device emulation, when guest writes 0xffffffff to the -BAR, -guest just want to get the size of the region but not really updating the -address space. -So I made the following patch to avoid update pci mapping. - -Do you think this make sense? - -[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR - -When guest writes 0xffffffff to the BAR, guest just want to get the size of the -region -but not really updating the address space. -So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings -or pci_bridge_update_mappings. - -Signed-off-by: xuyandong <address@hidden> ---- - hw/pci/pci.c | 6 ++++-- - hw/pci/pci_bridge.c | 8 +++++--- - 2 files changed, 9 insertions(+), 5 deletions(-) - -diff --git a/hw/pci/pci.c b/hw/pci/pci.c -index 56b13b3..ef368e1 100644 ---- a/hw/pci/pci.c -+++ b/hw/pci/pci.c -@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t -addr, uint32_t val_in, int - { - int i, was_irq_disabled = pci_irq_disabled(d); - uint32_t val = val_in; -+ uint64_t barmask = (1 << l*8) - 1; - - for (i = 0; i < l; val >>= 8, ++i) { - uint8_t wmask = d->wmask[addr + i]; -@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t -addr, uint32_t val_in, int - d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); - d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ - } -- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -+ if ((val_in != barmask && -+ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || - ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || -- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || -+ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || - range_covers_byte(addr, l, PCI_COMMAND)) - pci_update_mappings(d); - -diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -index ee9dff2..f2bad79 100644 ---- a/hw/pci/pci_bridge.c -+++ b/hw/pci/pci_bridge.c -@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, - PCIBridge *s = PCI_BRIDGE(d); - uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); - uint16_t newctl; -+ uint64_t barmask = (1 << len * 8) - 1; - - pci_default_write_config(d, address, val, len); - - if (ranges_overlap(address, len, PCI_COMMAND, 2) || - -- /* io base/limit */ -- ranges_overlap(address, len, PCI_IO_BASE, 2) || -+ (val != barmask && -+ /* io base/limit */ -+ (ranges_overlap(address, len, PCI_IO_BASE, 2) || - - /* memory base/limit, prefetchable base/limit and - io base/limit upper 16 */ -- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -+ ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || - - /* vga enable */ - ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { --- -1.8.3.1 - -On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote: -> -> > > > > > > > > > Hi all, -> -> > > > > > > > > > -> -> > > > > > > > > > -> -> > > > > > > > > > -> -> > > > > > > > > > In our test, we configured VM with several pci-bridges -> -> > > > > > > > > > and a virtio-net nic been attached with bus 4, -> -> > > > > > > > > > -> -> > > > > > > > > > After VM is startup, We ping this nic from host to -> -> > > > > > > > > > judge if it is working normally. Then, we hot add pci -> -> > > > > > > > > > devices to this VM with bus -> -> > > > > 0. -> -> > > > > > > > > > -> -> > > > > > > > > > We found the virtio-net NIC in bus 4 is not working -> -> > > > > > > > > > (can not -> -> > > > > > > > > > connect) occasionally, as it kick virtio backend -> -> > > > > > > > > > failure with error -> -> -> > > > But I have another question, if we only fix this problem in the -> -> > > > kernel, the Linux version that has been released does not work -> -> > > > well on the -> -> > > virtualization platform. -> -> > > > Is there a way to fix this problem in the backend? -> -> > > -> -> > > There could we a way to work around this. -> -> > > Does below help? -> -> > -> -> > I am sorry to tell you, I tested this patch and it doesn't work fine. -> -> > -> -> > > -> -> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index -> -> > > 236a20eaa8..7834cac4b0 100644 -> -> > > --- a/hw/i386/acpi-build.c -> -> > > +++ b/hw/i386/acpi-build.c -> -> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml -> -> > > *parent_scope, PCIBus *bus, -> -> > > -> -> > > aml_append(method, aml_store(aml_int(bsel_val), -> -> aml_name("BNUM"))); -> -> > > aml_append(method, -> -> > > - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device -> -> > > Check -> -> */) -> -> > > + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* -> -> > > + Device Check Light */) -> -> > > ); -> -> > > aml_append(method, -> -> > > aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject -> -> > > Request */) -> -> -> -> -> -> Oh I see, another bug: -> -> -> -> case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: -> -> acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT -> -> event\n"); -> -> /* TBD: Exactly what does 'light' mean? */ -> -> break; -> -> -> -> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type) -> -> and friends all just ignore this event type. -> -> -> -> -> -> -> -> -- -> -> MST -> -> -Hi Michael, -> -> -If we want to fix this problem on the backend, it is not enough to consider -> -only PCI -> -device hot plugging, because I found that if we use a command like -> -"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to -> -reproduce. -> -> -From the perspective of device emulation, when guest writes 0xffffffff to the -> -BAR, -> -guest just want to get the size of the region but not really updating the -> -address space. -> -So I made the following patch to avoid update pci mapping. -> -> -Do you think this make sense? -> -> -[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR -> -> -When guest writes 0xffffffff to the BAR, guest just want to get the size of -> -the region -> -but not really updating the address space. -> -So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings -> -or pci_bridge_update_mappings. -> -> -Signed-off-by: xuyandong <address@hidden> -I see how that will address the common case however there are a bunch of -issues here. First of all it's easy to trigger the update by some other -action like VM migration. More importantly it's just possible that -guest actually does want to set the low 32 bit of the address to all -ones. For example, that is clearly listed as a way to disable all -devices behind the bridge in the pci to pci bridge spec. - -Given upstream is dragging it's feet I'm open to adding a flag -that will help keep guests going as a temporary measure. -We will need to think about ways to restrict this as much as -we can. - - -> ---- -> -hw/pci/pci.c | 6 ++++-- -> -hw/pci/pci_bridge.c | 8 +++++--- -> -2 files changed, 9 insertions(+), 5 deletions(-) -> -> -diff --git a/hw/pci/pci.c b/hw/pci/pci.c -> -index 56b13b3..ef368e1 100644 -> ---- a/hw/pci/pci.c -> -+++ b/hw/pci/pci.c -> -@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t -> -addr, uint32_t val_in, int -> -{ -> -int i, was_irq_disabled = pci_irq_disabled(d); -> -uint32_t val = val_in; -> -+ uint64_t barmask = (1 << l*8) - 1; -> -> -for (i = 0; i < l; val >>= 8, ++i) { -> -uint8_t wmask = d->wmask[addr + i]; -> -@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t -> -addr, uint32_t val_in, int -> -d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); -> -d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ -> -} -> -- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -+ if ((val_in != barmask && -> -+ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || -> -- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || -> -+ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || -> -range_covers_byte(addr, l, PCI_COMMAND)) -> -pci_update_mappings(d); -> -> -diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -> -index ee9dff2..f2bad79 100644 -> ---- a/hw/pci/pci_bridge.c -> -+++ b/hw/pci/pci_bridge.c -> -@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, -> -PCIBridge *s = PCI_BRIDGE(d); -> -uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); -> -uint16_t newctl; -> -+ uint64_t barmask = (1 << len * 8) - 1; -> -> -pci_default_write_config(d, address, val, len); -> -> -if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> -- /* io base/limit */ -> -- ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -+ (val != barmask && -> -+ /* io base/limit */ -> -+ (ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> -/* memory base/limit, prefetchable base/limit and -> -io base/limit upper 16 */ -> -- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -+ ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || -> -> -/* vga enable */ -> -ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> --- -> -1.8.3.1 -> -> -> - -> ------Original Message----- -> -From: Michael S. Tsirkin [ -mailto:address@hidden -> -Sent: Monday, January 07, 2019 11:06 PM -> -To: xuyandong <address@hidden> -> -Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu- -> -address@hidden; Zhanghailiang <address@hidden>; -> -wangxin (U) <address@hidden>; Huangweidong (C) -> -<address@hidden> -> -Subject: Re: [BUG]Unassigned mem write during pci device hot-plug -> -> -On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote: -> -> > > > > > > > > > > Hi all, -> -> > > > > > > > > > > -> -> > > > > > > > > > > -> -> > > > > > > > > > > -> -> > > > > > > > > > > In our test, we configured VM with several -> -> > > > > > > > > > > pci-bridges and a virtio-net nic been attached -> -> > > > > > > > > > > with bus 4, -> -> > > > > > > > > > > -> -> > > > > > > > > > > After VM is startup, We ping this nic from host to -> -> > > > > > > > > > > judge if it is working normally. Then, we hot add -> -> > > > > > > > > > > pci devices to this VM with bus -> -> > > > > > 0. -> -> > > > > > > > > > > -> -> > > > > > > > > > > We found the virtio-net NIC in bus 4 is not -> -> > > > > > > > > > > working (can not -> -> > > > > > > > > > > connect) occasionally, as it kick virtio backend -> -> > > > > > > > > > > failure with error -> -> -> -> > > > > But I have another question, if we only fix this problem in -> -> > > > > the kernel, the Linux version that has been released does not -> -> > > > > work well on the -> -> > > > virtualization platform. -> -> > > > > Is there a way to fix this problem in the backend? -> -> -> -> Hi Michael, -> -> -> -> If we want to fix this problem on the backend, it is not enough to -> -> consider only PCI device hot plugging, because I found that if we use -> -> a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is very -> -easy to reproduce. -> -> -> -> From the perspective of device emulation, when guest writes 0xffffffff -> -> to the BAR, guest just want to get the size of the region but not really -> -updating the address space. -> -> So I made the following patch to avoid update pci mapping. -> -> -> -> Do you think this make sense? -> -> -> -> [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR -> -> -> -> When guest writes 0xffffffff to the BAR, guest just want to get the -> -> size of the region but not really updating the address space. -> -> So when guest writes 0xffffffff to BAR, we need avoid -> -> pci_update_mappings or pci_bridge_update_mappings. -> -> -> -> Signed-off-by: xuyandong <address@hidden> -> -> -I see how that will address the common case however there are a bunch of -> -issues here. First of all it's easy to trigger the update by some other -> -action like -> -VM migration. More importantly it's just possible that guest actually does -> -want -> -to set the low 32 bit of the address to all ones. For example, that is -> -clearly -> -listed as a way to disable all devices behind the bridge in the pci to pci -> -bridge -> -spec. -Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable -Base Upper 32 Bits -to meet the kernel double check problem. -Do you think there is still risk? - -> -> -Given upstream is dragging it's feet I'm open to adding a flag that will help -> -keep guests going as a temporary measure. -> -We will need to think about ways to restrict this as much as we can. -> -> -> -> --- -> -> hw/pci/pci.c | 6 ++++-- -> -> hw/pci/pci_bridge.c | 8 +++++--- -> -> 2 files changed, 9 insertions(+), 5 deletions(-) -> -> -> -> diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644 -> -> --- a/hw/pci/pci.c -> -> +++ b/hw/pci/pci.c -> -> @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, -> -> uint32_t addr, uint32_t val_in, int { -> -> int i, was_irq_disabled = pci_irq_disabled(d); -> -> uint32_t val = val_in; -> -> + uint64_t barmask = (1 << l*8) - 1; -> -> -> -> for (i = 0; i < l; val >>= 8, ++i) { -> -> uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ -> -> void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, -> -int -> -> d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & -> -> wmask); -> -> d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear -> -> */ -> -> } -> -> - if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -> + if ((val_in != barmask && -> -> + (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -> ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || -> -> - ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || -> -> + ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || -> -> range_covers_byte(addr, l, PCI_COMMAND)) -> -> pci_update_mappings(d); -> -> -> -> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index -> -> ee9dff2..f2bad79 100644 -> -> --- a/hw/pci/pci_bridge.c -> -> +++ b/hw/pci/pci_bridge.c -> -> @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, -> -> PCIBridge *s = PCI_BRIDGE(d); -> -> uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); -> -> uint16_t newctl; -> -> + uint64_t barmask = (1 << len * 8) - 1; -> -> -> -> pci_default_write_config(d, address, val, len); -> -> -> -> if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> -> -> - /* io base/limit */ -> -> - ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> + (val != barmask && -> -> + /* io base/limit */ -> -> + (ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> -> -> /* memory base/limit, prefetchable base/limit and -> -> io base/limit upper 16 */ -> -> - ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -> + ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || -> -> -> -> /* vga enable */ -> -> ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> -> -- -> -> 1.8.3.1 -> -> -> -> -> -> - -On Mon, Jan 07, 2019 at 03:28:36PM +0000, xuyandong wrote: -> -> -> -> -----Original Message----- -> -> From: Michael S. Tsirkin [ -mailto:address@hidden -> -> Sent: Monday, January 07, 2019 11:06 PM -> -> To: xuyandong <address@hidden> -> -> Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu- -> -> address@hidden; Zhanghailiang <address@hidden>; -> -> wangxin (U) <address@hidden>; Huangweidong (C) -> -> <address@hidden> -> -> Subject: Re: [BUG]Unassigned mem write during pci device hot-plug -> -> -> -> On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote: -> -> > > > > > > > > > > > Hi all, -> -> > > > > > > > > > > > -> -> > > > > > > > > > > > -> -> > > > > > > > > > > > -> -> > > > > > > > > > > > In our test, we configured VM with several -> -> > > > > > > > > > > > pci-bridges and a virtio-net nic been attached -> -> > > > > > > > > > > > with bus 4, -> -> > > > > > > > > > > > -> -> > > > > > > > > > > > After VM is startup, We ping this nic from host to -> -> > > > > > > > > > > > judge if it is working normally. Then, we hot add -> -> > > > > > > > > > > > pci devices to this VM with bus -> -> > > > > > > 0. -> -> > > > > > > > > > > > -> -> > > > > > > > > > > > We found the virtio-net NIC in bus 4 is not -> -> > > > > > > > > > > > working (can not -> -> > > > > > > > > > > > connect) occasionally, as it kick virtio backend -> -> > > > > > > > > > > > failure with error -> -> > -> -> > > > > > But I have another question, if we only fix this problem in -> -> > > > > > the kernel, the Linux version that has been released does not -> -> > > > > > work well on the -> -> > > > > virtualization platform. -> -> > > > > > Is there a way to fix this problem in the backend? -> -> > -> -> > Hi Michael, -> -> > -> -> > If we want to fix this problem on the backend, it is not enough to -> -> > consider only PCI device hot plugging, because I found that if we use -> -> > a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is -> -> > very -> -> easy to reproduce. -> -> > -> -> > From the perspective of device emulation, when guest writes 0xffffffff -> -> > to the BAR, guest just want to get the size of the region but not really -> -> updating the address space. -> -> > So I made the following patch to avoid update pci mapping. -> -> > -> -> > Do you think this make sense? -> -> > -> -> > [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR -> -> > -> -> > When guest writes 0xffffffff to the BAR, guest just want to get the -> -> > size of the region but not really updating the address space. -> -> > So when guest writes 0xffffffff to BAR, we need avoid -> -> > pci_update_mappings or pci_bridge_update_mappings. -> -> > -> -> > Signed-off-by: xuyandong <address@hidden> -> -> -> -> I see how that will address the common case however there are a bunch of -> -> issues here. First of all it's easy to trigger the update by some other -> -> action like -> -> VM migration. More importantly it's just possible that guest actually does -> -> want -> -> to set the low 32 bit of the address to all ones. For example, that is -> -> clearly -> -> listed as a way to disable all devices behind the bridge in the pci to pci -> -> bridge -> -> spec. -> -> -Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable -> -Base Upper 32 Bits -> -to meet the kernel double check problem. -> -Do you think there is still risk? -Well it's non zero since spec says such a write should disable all -accesses. Just an idea: why not add an option to disable upper 32 bit? -That is ugly and limits space but spec compliant. - -> -> -> -> Given upstream is dragging it's feet I'm open to adding a flag that will -> -> help -> -> keep guests going as a temporary measure. -> -> We will need to think about ways to restrict this as much as we can. -> -> -> -> -> -> > --- -> -> > hw/pci/pci.c | 6 ++++-- -> -> > hw/pci/pci_bridge.c | 8 +++++--- -> -> > 2 files changed, 9 insertions(+), 5 deletions(-) -> -> > -> -> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644 -> -> > --- a/hw/pci/pci.c -> -> > +++ b/hw/pci/pci.c -> -> > @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, -> -> > uint32_t addr, uint32_t val_in, int { -> -> > int i, was_irq_disabled = pci_irq_disabled(d); -> -> > uint32_t val = val_in; -> -> > + uint64_t barmask = (1 << l*8) - 1; -> -> > -> -> > for (i = 0; i < l; val >>= 8, ++i) { -> -> > uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ -> -> > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t -> -> > val_in, -> -> int -> -> > d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & -> -> > wmask); -> -> > d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to -> -> > Clear */ -> -> > } -> -> > - if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -> > + if ((val_in != barmask && -> -> > + (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -> > ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || -> -> > - ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || -> -> > + ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || -> -> > range_covers_byte(addr, l, PCI_COMMAND)) -> -> > pci_update_mappings(d); -> -> > -> -> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index -> -> > ee9dff2..f2bad79 100644 -> -> > --- a/hw/pci/pci_bridge.c -> -> > +++ b/hw/pci/pci_bridge.c -> -> > @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, -> -> > PCIBridge *s = PCI_BRIDGE(d); -> -> > uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); -> -> > uint16_t newctl; -> -> > + uint64_t barmask = (1 << len * 8) - 1; -> -> > -> -> > pci_default_write_config(d, address, val, len); -> -> > -> -> > if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> > -> -> > - /* io base/limit */ -> -> > - ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> > + (val != barmask && -> -> > + /* io base/limit */ -> -> > + (ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> > -> -> > /* memory base/limit, prefetchable base/limit and -> -> > io base/limit upper 16 */ -> -> > - ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -> > + ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || -> -> > -> -> > /* vga enable */ -> -> > ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> -> > -- -> -> > 1.8.3.1 -> -> > -> -> > -> -> > - -> ------Original Message----- -> -From: xuyandong -> -Sent: Monday, January 07, 2019 10:37 PM -> -To: 'Michael S. Tsirkin' <address@hidden> -> -Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu- -> -address@hidden; Zhanghailiang <address@hidden>; -> -wangxin (U) <address@hidden>; Huangweidong (C) -> -<address@hidden> -> -Subject: RE: [BUG]Unassigned mem write during pci device hot-plug -> -> -> > > > > > > > > > Hi all, -> -> > > > > > > > > > -> -> > > > > > > > > > -> -> > > > > > > > > > -> -> > > > > > > > > > In our test, we configured VM with several -> -> > > > > > > > > > pci-bridges and a virtio-net nic been attached with -> -> > > > > > > > > > bus 4, -> -> > > > > > > > > > -> -> > > > > > > > > > After VM is startup, We ping this nic from host to -> -> > > > > > > > > > judge if it is working normally. Then, we hot add -> -> > > > > > > > > > pci devices to this VM with bus -> -> > > > > 0. -> -> > > > > > > > > > -> -> > > > > > > > > > We found the virtio-net NIC in bus 4 is not working -> -> > > > > > > > > > (can not -> -> > > > > > > > > > connect) occasionally, as it kick virtio backend -> -> > > > > > > > > > failure with error -> -> -> > > > But I have another question, if we only fix this problem in the -> -> > > > kernel, the Linux version that has been released does not work -> -> > > > well on the -> -> > > virtualization platform. -> -> > > > Is there a way to fix this problem in the backend? -> -> > > -> -> > > There could we a way to work around this. -> -> > > Does below help? -> -> > -> -> > I am sorry to tell you, I tested this patch and it doesn't work fine. -> -> > -> -> > > -> -> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index -> -> > > 236a20eaa8..7834cac4b0 100644 -> -> > > --- a/hw/i386/acpi-build.c -> -> > > +++ b/hw/i386/acpi-build.c -> -> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml -> -> > > *parent_scope, PCIBus *bus, -> -> > > -> -> > > aml_append(method, aml_store(aml_int(bsel_val), -> -> aml_name("BNUM"))); -> -> > > aml_append(method, -> -> > > - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device -> -Check -> -> */) -> -> > > + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* -> -> > > + Device Check Light */) -> -> > > ); -> -> > > aml_append(method, -> -> > > aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* -> -> > > Eject Request */) -> -> -> -> -> -> Oh I see, another bug: -> -> -> -> case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: -> -> acpi_handle_debug(handle, -> -> "ACPI_NOTIFY_DEVICE_CHECK_LIGHT event\n"); -> -> /* TBD: Exactly what does 'light' mean? */ -> -> break; -> -> -> -> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 -> -> type) and friends all just ignore this event type. -> -> -> -> -> -> -> -> -- -> -> MST -> -> -Hi Michael, -> -> -If we want to fix this problem on the backend, it is not enough to consider -> -only -> -PCI device hot plugging, because I found that if we use a command like "echo -> -1 > -> -/sys/bus/pci/rescan" in guest, this problem is very easy to reproduce. -> -> -From the perspective of device emulation, when guest writes 0xffffffff to the -> -BAR, guest just want to get the size of the region but not really updating the -> -address space. -> -So I made the following patch to avoid update pci mapping. -> -> -Do you think this make sense? -> -> -[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR -> -> -When guest writes 0xffffffff to the BAR, guest just want to get the size of -> -the -> -region but not really updating the address space. -> -So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings or -> -pci_bridge_update_mappings. -> -> -Signed-off-by: xuyandong <address@hidden> -> ---- -> -hw/pci/pci.c | 6 ++++-- -> -hw/pci/pci_bridge.c | 8 +++++--- -> -2 files changed, 9 insertions(+), 5 deletions(-) -> -> -diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644 -> ---- a/hw/pci/pci.c -> -+++ b/hw/pci/pci.c -> -@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t -> -addr, uint32_t val_in, int { -> -int i, was_irq_disabled = pci_irq_disabled(d); -> -uint32_t val = val_in; -> -+ uint64_t barmask = (1 << l*8) - 1; -> -> -for (i = 0; i < l; val >>= 8, ++i) { -> -uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ void -> -pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int -> -d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); -> -d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ -> -} -> -- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -+ if ((val_in != barmask && -> -+ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -> -ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || -> -- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || -> -+ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || -> -range_covers_byte(addr, l, PCI_COMMAND)) -> -pci_update_mappings(d); -> -> -diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index ee9dff2..f2bad79 -> -100644 -> ---- a/hw/pci/pci_bridge.c -> -+++ b/hw/pci/pci_bridge.c -> -@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, -> -PCIBridge *s = PCI_BRIDGE(d); -> -uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); -> -uint16_t newctl; -> -+ uint64_t barmask = (1 << len * 8) - 1; -> -> -pci_default_write_config(d, address, val, len); -> -> -if (ranges_overlap(address, len, PCI_COMMAND, 2) || -> -> -- /* io base/limit */ -> -- ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -+ (val != barmask && -> -+ /* io base/limit */ -> -+ (ranges_overlap(address, len, PCI_IO_BASE, 2) || -> -> -/* memory base/limit, prefetchable base/limit and -> -io base/limit upper 16 */ -> -- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -> -+ ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || -> -> -/* vga enable */ -> -ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -> --- -> -1.8.3.1 -> -> -Sorry, please ignore the patch above. - -Here is the patch I want to post: - -diff --git a/hw/pci/pci.c b/hw/pci/pci.c -index 56b13b3..38a300f 100644 ---- a/hw/pci/pci.c -+++ b/hw/pci/pci.c -@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t -addr, uint32_t val_in, int - { - int i, was_irq_disabled = pci_irq_disabled(d); - uint32_t val = val_in; -+ uint64_t barmask = ((uint64_t)1 << l*8) - 1; - - for (i = 0; i < l; val >>= 8, ++i) { - uint8_t wmask = d->wmask[addr + i]; -@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t -addr, uint32_t val_in, int - d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); - d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ - } -- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || -+ if ((val_in != barmask && -+ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || - ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || -- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || -+ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || - range_covers_byte(addr, l, PCI_COMMAND)) - pci_update_mappings(d); - -diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c -index ee9dff2..b8f7d48 100644 ---- a/hw/pci/pci_bridge.c -+++ b/hw/pci/pci_bridge.c -@@ -253,20 +253,22 @@ void pci_bridge_write_config(PCIDevice *d, - PCIBridge *s = PCI_BRIDGE(d); - uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); - uint16_t newctl; -+ uint64_t barmask = ((uint64_t)1 << len * 8) - 1; - - pci_default_write_config(d, address, val, len); - - if (ranges_overlap(address, len, PCI_COMMAND, 2) || - -- /* io base/limit */ -- ranges_overlap(address, len, PCI_IO_BASE, 2) || -+ /* vga enable */ -+ ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2) || - -- /* memory base/limit, prefetchable base/limit and -- io base/limit upper 16 */ -- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || -+ (val != barmask && -+ /* io base/limit */ -+ (ranges_overlap(address, len, PCI_IO_BASE, 2) || - -- /* vga enable */ -- ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { -+ /* memory base/limit, prefetchable base/limit and -+ io base/limit upper 16 */ -+ ranges_overlap(address, len, PCI_MEMORY_BASE, 20)))) { - pci_bridge_update_mappings(s); - } - --- -1.8.3.1 - diff --git a/results/classifier/016/virtual/70416488 b/results/classifier/016/virtual/70416488 deleted file mode 100644 index 8e21bd76..00000000 --- a/results/classifier/016/virtual/70416488 +++ /dev/null @@ -1,1206 +0,0 @@ -virtual: 0.872 -debug: 0.857 -boot: 0.817 -kernel: 0.804 -hypervisor: 0.767 -arm: 0.323 -KVM: 0.289 -operating system: 0.251 -TCG: 0.103 -VMM: 0.064 -device: 0.047 -PID: 0.037 -register: 0.032 -files: 0.027 -assembly: 0.017 -semantic: 0.016 -socket: 0.015 -peripherals: 0.013 -user-level: 0.009 -performance: 0.009 -vnc: 0.007 -architecture: 0.006 -risc-v: 0.004 -network: 0.003 -alpha: 0.003 -permissions: 0.002 -graphic: 0.002 -ppc: 0.002 -x86: 0.000 -mistranslation: 0.000 -i386: 0.000 - -[Bug Report] smmuv3 event 0x10 report when running virtio-blk-pci - -Hi All, - -When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 -during kernel booting up. - -qemu command which I use is as below: - -qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ --kernel Image -initrd minifs.cpio.gz \ --enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ --append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ --device -pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -\ --device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ --device -virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ --drive file=/home/boot.img,if=none,id=drive0,format=raw - -smmuv3 event 0x10 log: -[...] -[ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -[ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -[ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -[ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 -GB/1.00 GiB) -[ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -[ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -[ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -[ 1.968381] clk: Disabling unused clocks -[ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -[ 1.968990] PM: genpd: Disabling unused power domains -[ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -[ 1.969814] ALSA device list: -[ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -[ 1.970471] No soundcards found. -[ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -[ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -[ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -[ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -[ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -[ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -[ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -[ 1.975005] Freeing unused kernel memory: 10112K -[ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -[ 1.975442] Run init as init process - -Another information is that if "maxcpus=3" is removed from the kernel command -line, -it will be OK. - -I am not sure if there is a bug about vsmmu. It will be very appreciated if -anyone -know this issue or can take a look at it. - -Thanks, -Zhou - -On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: -> -> -Hi All, -> -> -When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 -> -during kernel booting up. -Does it still do this if you either: - (1) use the v9.1.0 release (commit fd1952d814da) - (2) use "-machine virt-9.1" instead of "-machine virt" - -? - -My suspicion is that this will have started happening now that -we expose an SMMU with two-stage translation support to the guest -in the "virt" machine type (which we do not if you either -use virt-9.1 or in the v9.1.0 release). - -I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of -the two-stage support). - -> -qemu command which I use is as below: -> -> -qemu-system-aarch64 -machine -> -virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ -> --kernel Image -initrd minifs.cpio.gz \ -> --enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ -> --append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ -> --device -> -pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -> -\ -> --device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ -> --device -> -virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ -> --drive file=/home/boot.img,if=none,id=drive0,format=raw -> -> -smmuv3 event 0x10 log: -> -[...] -> -[ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -> -[ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -> -[ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -> -[ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks -> -(1.07 GB/1.00 GiB) -> -[ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -[ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -> -[ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -[ 1.968381] clk: Disabling unused clocks -> -[ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -[ 1.968990] PM: genpd: Disabling unused power domains -> -[ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.969814] ALSA device list: -> -[ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.970471] No soundcards found. -> -[ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -[ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -[ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -[ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -[ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.975005] Freeing unused kernel memory: 10112K -> -[ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.975442] Run init as init process -> -> -Another information is that if "maxcpus=3" is removed from the kernel command -> -line, -> -it will be OK. -> -> -I am not sure if there is a bug about vsmmu. It will be very appreciated if -> -anyone -> -know this issue or can take a look at it. -thanks --- PMM - -On 2024/9/9 22:31, Peter Maydell wrote: -> -On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: -> -> -> -> Hi All, -> -> -> -> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 -> -> during kernel booting up. -> -> -Does it still do this if you either: -> -(1) use the v9.1.0 release (commit fd1952d814da) -> -(2) use "-machine virt-9.1" instead of "-machine virt" -I tested above two cases, the problem is still there. - -> -> -? -> -> -My suspicion is that this will have started happening now that -> -we expose an SMMU with two-stage translation support to the guest -> -in the "virt" machine type (which we do not if you either -> -use virt-9.1 or in the v9.1.0 release). -> -> -I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of -> -the two-stage support). -> -> -> qemu command which I use is as below: -> -> -> -> qemu-system-aarch64 -machine -> -> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ -> -> -kernel Image -initrd minifs.cpio.gz \ -> -> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ -> -> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ -> -> -device -> -> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -> -> \ -> -> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ -> -> -device -> -> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ -> -> -drive file=/home/boot.img,if=none,id=drive0,format=raw -> -> -> -> smmuv3 event 0x10 log: -> -> [...] -> -> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -> -> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -> -> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -> -> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks -> -> (1.07 GB/1.00 GiB) -> -> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -> -> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -> [ 1.968381] clk: Disabling unused clocks -> -> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -> [ 1.968990] PM: genpd: Disabling unused power domains -> -> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.969814] ALSA device list: -> -> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.970471] No soundcards found. -> -> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.975005] Freeing unused kernel memory: 10112K -> -> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.975442] Run init as init process -> -> -> -> Another information is that if "maxcpus=3" is removed from the kernel -> -> command line, -> -> it will be OK. -> -> -> -> I am not sure if there is a bug about vsmmu. It will be very appreciated if -> -> anyone -> -> know this issue or can take a look at it. -> -> -thanks -> --- PMM -> -. - -Hi Zhou, -On 9/10/24 03:24, Zhou Wang via wrote: -> -On 2024/9/9 22:31, Peter Maydell wrote: -> -> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: -> ->> Hi All, -> ->> -> ->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 -> ->> during kernel booting up. -> -> Does it still do this if you either: -> -> (1) use the v9.1.0 release (commit fd1952d814da) -> -> (2) use "-machine virt-9.1" instead of "-machine virt" -> -I tested above two cases, the problem is still there. -Thank you for reporting. I am able to reproduce and effectively the -maxcpus kernel option is triggering the issue. It works without. I will -come back to you asap. - -Eric -> -> -> ? -> -> -> -> My suspicion is that this will have started happening now that -> -> we expose an SMMU with two-stage translation support to the guest -> -> in the "virt" machine type (which we do not if you either -> -> use virt-9.1 or in the v9.1.0 release). -> -> -> -> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of -> -> the two-stage support). -> -> -> ->> qemu command which I use is as below: -> ->> -> ->> qemu-system-aarch64 -machine -> ->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ -> ->> -kernel Image -initrd minifs.cpio.gz \ -> ->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ -> ->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ -> ->> -device -> ->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -> ->> \ -> ->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ -> ->> -device -> ->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ -> ->> -drive file=/home/boot.img,if=none,id=drive0,format=raw -> ->> -> ->> smmuv3 event 0x10 log: -> ->> [...] -> ->> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -> ->> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -> ->> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -> ->> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks -> ->> (1.07 GB/1.00 GiB) -> ->> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -> ->> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.968381] clk: Disabling unused clocks -> ->> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.968990] PM: genpd: Disabling unused power domains -> ->> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.969814] ALSA device list: -> ->> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.970471] No soundcards found. -> ->> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.975005] Freeing unused kernel memory: 10112K -> ->> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.975442] Run init as init process -> ->> -> ->> Another information is that if "maxcpus=3" is removed from the kernel -> ->> command line, -> ->> it will be OK. -> ->> -> ->> I am not sure if there is a bug about vsmmu. It will be very appreciated if -> ->> anyone -> ->> know this issue or can take a look at it. -> -> thanks -> -> -- PMM -> -> . - -Hi, - -On 9/10/24 03:24, Zhou Wang via wrote: -> -On 2024/9/9 22:31, Peter Maydell wrote: -> -> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: -> ->> Hi All, -> ->> -> ->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 -> ->> during kernel booting up. -> -> Does it still do this if you either: -> -> (1) use the v9.1.0 release (commit fd1952d814da) -> -> (2) use "-machine virt-9.1" instead of "-machine virt" -> -I tested above two cases, the problem is still there. -I have not much progressed yet but I see it comes with -qemu traces. - -smmuv3-iommu-memory-region-0-0 translation failed for iova=0x0 -(SMMU_EVT_F_TRANSLATION) -../.. -qemu-system-aarch64: virtio-blk failed to set guest notifier (-22), -ensure -accel kvm is set. -qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to -userspace (slower). - -the PCIe Host bridge seems to cause that translation failure at iova=0 - -Also virtio-iommu has the same issue: -qemu-system-aarch64: virtio_iommu_translate no mapping for 0x0 for sid=1024 -qemu-system-aarch64: virtio-blk failed to set guest notifier (-22), -ensure -accel kvm is set. -qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to -userspace (slower). - -Only happens with maxcpus=3. Note the virtio-blk-pci is not protected by -the vIOMMU in your case. - -Thanks - -Eric - -> -> -> ? -> -> -> -> My suspicion is that this will have started happening now that -> -> we expose an SMMU with two-stage translation support to the guest -> -> in the "virt" machine type (which we do not if you either -> -> use virt-9.1 or in the v9.1.0 release). -> -> -> -> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of -> -> the two-stage support). -> -> -> ->> qemu command which I use is as below: -> ->> -> ->> qemu-system-aarch64 -machine -> ->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ -> ->> -kernel Image -initrd minifs.cpio.gz \ -> ->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ -> ->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ -> ->> -device -> ->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -> ->> \ -> ->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ -> ->> -device -> ->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ -> ->> -drive file=/home/boot.img,if=none,id=drive0,format=raw -> ->> -> ->> smmuv3 event 0x10 log: -> ->> [...] -> ->> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -> ->> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -> ->> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -> ->> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks -> ->> (1.07 GB/1.00 GiB) -> ->> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -> ->> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.968381] clk: Disabling unused clocks -> ->> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.968990] PM: genpd: Disabling unused power domains -> ->> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.969814] ALSA device list: -> ->> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.970471] No soundcards found. -> ->> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.975005] Freeing unused kernel memory: 10112K -> ->> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.975442] Run init as init process -> ->> -> ->> Another information is that if "maxcpus=3" is removed from the kernel -> ->> command line, -> ->> it will be OK. -> ->> -> ->> I am not sure if there is a bug about vsmmu. It will be very appreciated if -> ->> anyone -> ->> know this issue or can take a look at it. -> -> thanks -> -> -- PMM -> -> . - -Hi Zhou, - -On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote: -> -> -Hi All, -> -> -When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 -> -during kernel booting up. -> -> -qemu command which I use is as below: -> -> -qemu-system-aarch64 -machine -> -virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ -> --kernel Image -initrd minifs.cpio.gz \ -> --enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ -> --append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ -> --device -> -pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -> -\ -> --device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ -> --device -> -virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ -> --drive file=/home/boot.img,if=none,id=drive0,format=raw -> -> -smmuv3 event 0x10 log: -> -[...] -> -[ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -> -[ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -> -[ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -> -[ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks -> -(1.07 GB/1.00 GiB) -> -[ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -[ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -> -[ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -[ 1.968381] clk: Disabling unused clocks -> -[ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -[ 1.968990] PM: genpd: Disabling unused power domains -> -[ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.969814] ALSA device list: -> -[ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.970471] No soundcards found. -> -[ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -[ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -[ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -[ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -[ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.975005] Freeing unused kernel memory: 10112K -> -[ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -[ 1.975442] Run init as init process -> -> -Another information is that if "maxcpus=3" is removed from the kernel command -> -line, -> -it will be OK. -> -That's interesting, not sure how that would be related. - -> -I am not sure if there is a bug about vsmmu. It will be very appreciated if -> -anyone -> -know this issue or can take a look at it. -> -Can you please provide logs with adding "-d trace:smmu*" to qemu invocation. - -Also if possible, can you please provide which Linux kernel version -you are using, I will see if I can repro. - -Thanks, -Mostafa - -> -Thanks, -> -Zhou -> -> -> - -On 2024/9/9 22:47, Mostafa Saleh wrote: -> -Hi Zhou, -> -> -On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote: -> -> -> -> Hi All, -> -> -> -> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 -> -> during kernel booting up. -> -> -> -> qemu command which I use is as below: -> -> -> -> qemu-system-aarch64 -machine -> -> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ -> -> -kernel Image -initrd minifs.cpio.gz \ -> -> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ -> -> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ -> -> -device -> -> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -> -> \ -> -> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ -> -> -device -> -> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ -> -> -drive file=/home/boot.img,if=none,id=drive0,format=raw -> -> -> -> smmuv3 event 0x10 log: -> -> [...] -> -> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -> -> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -> -> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -> -> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks -> -> (1.07 GB/1.00 GiB) -> -> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -> -> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -> [ 1.968381] clk: Disabling unused clocks -> -> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -> [ 1.968990] PM: genpd: Disabling unused power domains -> -> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.969814] ALSA device list: -> -> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.970471] No soundcards found. -> -> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> -> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> -> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> -> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.975005] Freeing unused kernel memory: 10112K -> -> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> -> [ 1.975442] Run init as init process -> -> -> -> Another information is that if "maxcpus=3" is removed from the kernel -> -> command line, -> -> it will be OK. -> -> -> -> -That's interesting, not sure how that would be related. -> -> -> I am not sure if there is a bug about vsmmu. It will be very appreciated if -> -> anyone -> -> know this issue or can take a look at it. -> -> -> -> -Can you please provide logs with adding "-d trace:smmu*" to qemu invocation. -Sure. Please see the attached log(using above qemu commit and command). - -> -> -Also if possible, can you please provide which Linux kernel version -> -you are using, I will see if I can repro. -I just use the latest mainline kernel(commit b831f83e40a2) with defconfig. - -Thanks, -Zhou - -> -> -Thanks, -> -Mostafa -> -> -> Thanks, -> -> Zhou -> -> -> -> -> -> -> -> -. -qemu_boot_log.txt -Description: -Text document - -On Tue, Sep 10, 2024 at 2:51â¯AM Zhou Wang <wangzhou1@hisilicon.com> wrote: -> -> -On 2024/9/9 22:47, Mostafa Saleh wrote: -> -> Hi Zhou, -> -> -> -> On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote: -> ->> -> ->> Hi All, -> ->> -> ->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event -> ->> 0x10 -> ->> during kernel booting up. -> ->> -> ->> qemu command which I use is as below: -> ->> -> ->> qemu-system-aarch64 -machine -> ->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ -> ->> -kernel Image -initrd minifs.cpio.gz \ -> ->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ -> ->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ -> ->> -device -> ->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 -> ->> \ -> ->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 -> ->> \ -> ->> -device -> ->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ -> ->> -drive file=/home/boot.img,if=none,id=drive0,format=raw -> ->> -> ->> smmuv3 event 0x10 log: -> ->> [...] -> ->> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 -> ->> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) -> ->> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues -> ->> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks -> ->> (1.07 GB/1.00 GiB) -> ->> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 -> ->> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.968381] clk: Disabling unused clocks -> ->> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.968990] PM: genpd: Disabling unused power domains -> ->> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.969814] ALSA device list: -> ->> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.970471] No soundcards found. -> ->> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: -> ->> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 -> ->> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 -> ->> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.975005] Freeing unused kernel memory: 10112K -> ->> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 -> ->> [ 1.975442] Run init as init process -> ->> -> ->> Another information is that if "maxcpus=3" is removed from the kernel -> ->> command line, -> ->> it will be OK. -> ->> -> -> -> -> That's interesting, not sure how that would be related. -> -> -> ->> I am not sure if there is a bug about vsmmu. It will be very appreciated -> ->> if anyone -> ->> know this issue or can take a look at it. -> ->> -> -> -> -> Can you please provide logs with adding "-d trace:smmu*" to qemu invocation. -> -> -Sure. Please see the attached log(using above qemu commit and command). -> -Thanks a lot, it seems the SMMUv3 indeed receives a translation -request with addr 0x0 which causes this event. -I don't see any kind of modification (alignment) of the address in this path. -So my hunch it's not related to the SMMUv3 and the initiator is -issuing bogus addresses. - -> -> -> -> Also if possible, can you please provide which Linux kernel version -> -> you are using, I will see if I can repro. -> -> -I just use the latest mainline kernel(commit b831f83e40a2) with defconfig. -> -I see, I can't repro in my setup which has no "--enable-kvm" and with -"-cpu max" instead of host. -I will try other options and see if I can repro. - -Thanks, -Mostafa -> -Thanks, -> -Zhou -> -> -> -> -> Thanks, -> -> Mostafa -> -> -> ->> Thanks, -> ->> Zhou -> ->> -> ->> -> ->> -> -> -> -> . - diff --git a/results/classifier/016/virtual/74466963 b/results/classifier/016/virtual/74466963 deleted file mode 100644 index a738c771..00000000 --- a/results/classifier/016/virtual/74466963 +++ /dev/null @@ -1,1905 +0,0 @@ -TCG: 0.983 -virtual: 0.972 -hypervisor: 0.881 -vnc: 0.807 -debug: 0.545 -x86: 0.169 -operating system: 0.077 -network: 0.046 -socket: 0.038 -boot: 0.036 -register: 0.035 -device: 0.027 -PID: 0.016 -files: 0.014 -VMM: 0.013 -user-level: 0.006 -assembly: 0.006 -kernel: 0.006 -ppc: 0.006 -semantic: 0.006 -performance: 0.005 -architecture: 0.004 -KVM: 0.003 -risc-v: 0.002 -peripherals: 0.002 -permissions: 0.002 -alpha: 0.002 -graphic: 0.001 -arm: 0.001 -mistranslation: 0.000 -i386: 0.000 - -[Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration - -Hi all, - -Does anyboday remember the similar issue post by hailiang months ago -http://patchwork.ozlabs.org/patch/454322/ -At least tow bugs about migration had been fixed since that. -And now we found the same issue at the tcg vm(kvm is fine), after -migration, the content VM's memory is inconsistent. -we add a patch to check memory content, you can find it from affix - -steps to reporduce: -1) apply the patch and re-build qemu -2) prepare the ubuntu guest and run memtest in grub. -soruce side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off -destination side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 -3) start migration -with 1000M NIC, migration will finish within 3 min. - -at source: -(qemu) migrate tcp:192.168.2.66:8881 -after saving ram complete -e9e725df678d392b1a83b3a917f332bb -qemu-system-x86_64: end ram md5 -(qemu) - -at destination: -...skip... -Completed load of VM with exit code 0 seq iteration 1264 -Completed load of VM with exit code 0 seq iteration 1265 -Completed load of VM with exit code 0 seq iteration 1266 -qemu-system-x86_64: after loading state section id 2(ram) -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 -qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init - -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 - -This occurs occasionally and only at tcg machine. It seems that -some pages dirtied in source side don't transferred to destination. -This problem can be reproduced even if we disable virtio. -Is it OK for some pages that not transferred to destination when do -migration ? Or is it a bug? -Any idea... - -=================md5 check patch============================= - -diff --git a/Makefile.target b/Makefile.target -index 962d004..e2cb8e9 100644 ---- a/Makefile.target -+++ b/Makefile.target -@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o - obj-y += memory_mapping.o - obj-y += dump.o - obj-y += migration/ram.o migration/savevm.o --LIBS := $(libs_softmmu) $(LIBS) -+LIBS := $(libs_softmmu) $(LIBS) -lplumb - - # xen support - obj-$(CONFIG_XEN) += xen-common.o -diff --git a/migration/ram.c b/migration/ram.c -index 1eb155a..3b7a09d 100644 ---- a/migration/ram.c -+++ b/migration/ram.c -@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int -version_id) -} - - rcu_read_unlock(); -- DPRINTF("Completed load of VM with exit code %d seq iteration " -+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " - "%" PRIu64 "\n", ret, seq_iter); - return ret; - } -diff --git a/migration/savevm.c b/migration/savevm.c -index 0ad1b93..3feaa61 100644 ---- a/migration/savevm.c -+++ b/migration/savevm.c -@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) - - } - -+#include "exec/ram_addr.h" -+#include "qemu/rcu_queue.h" -+#include <clplumbing/md5.h> -+#ifndef MD5_DIGEST_LENGTH -+#define MD5_DIGEST_LENGTH 16 -+#endif -+ -+static void check_host_md5(void) -+{ -+ int i; -+ unsigned char md[MD5_DIGEST_LENGTH]; -+ rcu_read_lock(); -+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check -'pc.ram' block */ -+ rcu_read_unlock(); -+ -+ MD5(block->host, block->used_length, md); -+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { -+ fprintf(stderr, "%02x", md[i]); -+ } -+ fprintf(stderr, "\n"); -+ error_report("end ram md5"); -+} -+ - void qemu_savevm_state_begin(QEMUFile *f, - const MigrationParams *params) - { -@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile -*f, bool iterable_only) -save_section_header(f, se, QEMU_VM_SECTION_END); - - ret = se->ops->save_live_complete_precopy(f, se->opaque); -+ -+ fprintf(stderr, "after saving %s complete\n", se->idstr); -+ check_host_md5(); -+ - trace_savevm_section_end(se->idstr, se->section_id, ret); - save_section_footer(f, se); - if (ret < 0) { -@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, -MigrationIncomingState *mis) -section_id, le->se->idstr); - return ret; - } -+ if (section_type == QEMU_VM_SECTION_END) { -+ error_report("after loading state section id %d(%s)", -+ section_id, le->se->idstr); -+ check_host_md5(); -+ } - if (!check_section_footer(f, le)) { - return -EINVAL; - } -@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) - } - - cpu_synchronize_all_post_init(); -+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); -+ check_host_md5(); - - return ret; - } - -* Li Zhijian (address@hidden) wrote: -> -Hi all, -> -> -Does anyboday remember the similar issue post by hailiang months ago -> -http://patchwork.ozlabs.org/patch/454322/ -> -At least tow bugs about migration had been fixed since that. -Yes, I wondered what happened to that. - -> -And now we found the same issue at the tcg vm(kvm is fine), after migration, -> -the content VM's memory is inconsistent. -Hmm, TCG only - I don't know much about that; but I guess something must -be accessing memory without using the proper macros/functions so -it doesn't mark it as dirty. - -> -we add a patch to check memory content, you can find it from affix -> -> -steps to reporduce: -> -1) apply the patch and re-build qemu -> -2) prepare the ubuntu guest and run memtest in grub. -> -soruce side: -> -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -> -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -> -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -> -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -> --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -> -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -> -pc-i440fx-2.3,accel=tcg,usb=off -> -> -destination side: -> -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -> -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -> -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -> -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -> --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -> -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -> -pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 -> -> -3) start migration -> -with 1000M NIC, migration will finish within 3 min. -> -> -at source: -> -(qemu) migrate tcp:192.168.2.66:8881 -> -after saving ram complete -> -e9e725df678d392b1a83b3a917f332bb -> -qemu-system-x86_64: end ram md5 -> -(qemu) -> -> -at destination: -> -...skip... -> -Completed load of VM with exit code 0 seq iteration 1264 -> -Completed load of VM with exit code 0 seq iteration 1265 -> -Completed load of VM with exit code 0 seq iteration 1266 -> -qemu-system-x86_64: after loading state section id 2(ram) -> -49c2dac7bde0e5e22db7280dcb3824f9 -> -qemu-system-x86_64: end ram md5 -> -qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init -> -> -49c2dac7bde0e5e22db7280dcb3824f9 -> -qemu-system-x86_64: end ram md5 -> -> -This occurs occasionally and only at tcg machine. It seems that -> -some pages dirtied in source side don't transferred to destination. -> -This problem can be reproduced even if we disable virtio. -> -> -Is it OK for some pages that not transferred to destination when do -> -migration ? Or is it a bug? -I'm pretty sure that means it's a bug. Hard to find though, I guess -at least memtest is smaller than a big OS. I think I'd dump the whole -of memory on both sides, hexdump and diff them - I'd guess it would -just be one byte/word different, maybe that would offer some idea what -wrote it. - -Dave - -> -Any idea... -> -> -=================md5 check patch============================= -> -> -diff --git a/Makefile.target b/Makefile.target -> -index 962d004..e2cb8e9 100644 -> ---- a/Makefile.target -> -+++ b/Makefile.target -> -@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o -> -obj-y += memory_mapping.o -> -obj-y += dump.o -> -obj-y += migration/ram.o migration/savevm.o -> --LIBS := $(libs_softmmu) $(LIBS) -> -+LIBS := $(libs_softmmu) $(LIBS) -lplumb -> -> -# xen support -> -obj-$(CONFIG_XEN) += xen-common.o -> -diff --git a/migration/ram.c b/migration/ram.c -> -index 1eb155a..3b7a09d 100644 -> ---- a/migration/ram.c -> -+++ b/migration/ram.c -> -@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int -> -version_id) -> -} -> -> -rcu_read_unlock(); -> -- DPRINTF("Completed load of VM with exit code %d seq iteration " -> -+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " -> -"%" PRIu64 "\n", ret, seq_iter); -> -return ret; -> -} -> -diff --git a/migration/savevm.c b/migration/savevm.c -> -index 0ad1b93..3feaa61 100644 -> ---- a/migration/savevm.c -> -+++ b/migration/savevm.c -> -@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) -> -> -} -> -> -+#include "exec/ram_addr.h" -> -+#include "qemu/rcu_queue.h" -> -+#include <clplumbing/md5.h> -> -+#ifndef MD5_DIGEST_LENGTH -> -+#define MD5_DIGEST_LENGTH 16 -> -+#endif -> -+ -> -+static void check_host_md5(void) -> -+{ -> -+ int i; -> -+ unsigned char md[MD5_DIGEST_LENGTH]; -> -+ rcu_read_lock(); -> -+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check -> -'pc.ram' block */ -> -+ rcu_read_unlock(); -> -+ -> -+ MD5(block->host, block->used_length, md); -> -+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { -> -+ fprintf(stderr, "%02x", md[i]); -> -+ } -> -+ fprintf(stderr, "\n"); -> -+ error_report("end ram md5"); -> -+} -> -+ -> -void qemu_savevm_state_begin(QEMUFile *f, -> -const MigrationParams *params) -> -{ -> -@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, -> -bool iterable_only) -> -save_section_header(f, se, QEMU_VM_SECTION_END); -> -> -ret = se->ops->save_live_complete_precopy(f, se->opaque); -> -+ -> -+ fprintf(stderr, "after saving %s complete\n", se->idstr); -> -+ check_host_md5(); -> -+ -> -trace_savevm_section_end(se->idstr, se->section_id, ret); -> -save_section_footer(f, se); -> -if (ret < 0) { -> -@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, -> -MigrationIncomingState *mis) -> -section_id, le->se->idstr); -> -return ret; -> -} -> -+ if (section_type == QEMU_VM_SECTION_END) { -> -+ error_report("after loading state section id %d(%s)", -> -+ section_id, le->se->idstr); -> -+ check_host_md5(); -> -+ } -> -if (!check_section_footer(f, le)) { -> -return -EINVAL; -> -} -> -@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) -> -} -> -> -cpu_synchronize_all_post_init(); -> -+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); -> -+ check_host_md5(); -> -> -return ret; -> -} -> -> -> --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - -On 2015/12/3 17:24, Dr. David Alan Gilbert wrote: -* Li Zhijian (address@hidden) wrote: -Hi all, - -Does anyboday remember the similar issue post by hailiang months ago -http://patchwork.ozlabs.org/patch/454322/ -At least tow bugs about migration had been fixed since that. -Yes, I wondered what happened to that. -And now we found the same issue at the tcg vm(kvm is fine), after migration, -the content VM's memory is inconsistent. -Hmm, TCG only - I don't know much about that; but I guess something must -be accessing memory without using the proper macros/functions so -it doesn't mark it as dirty. -we add a patch to check memory content, you can find it from affix - -steps to reporduce: -1) apply the patch and re-build qemu -2) prepare the ubuntu guest and run memtest in grub. -soruce side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off - -destination side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 - -3) start migration -with 1000M NIC, migration will finish within 3 min. - -at source: -(qemu) migrate tcp:192.168.2.66:8881 -after saving ram complete -e9e725df678d392b1a83b3a917f332bb -qemu-system-x86_64: end ram md5 -(qemu) - -at destination: -...skip... -Completed load of VM with exit code 0 seq iteration 1264 -Completed load of VM with exit code 0 seq iteration 1265 -Completed load of VM with exit code 0 seq iteration 1266 -qemu-system-x86_64: after loading state section id 2(ram) -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 -qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init - -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 - -This occurs occasionally and only at tcg machine. It seems that -some pages dirtied in source side don't transferred to destination. -This problem can be reproduced even if we disable virtio. - -Is it OK for some pages that not transferred to destination when do -migration ? Or is it a bug? -I'm pretty sure that means it's a bug. Hard to find though, I guess -at least memtest is smaller than a big OS. I think I'd dump the whole -of memory on both sides, hexdump and diff them - I'd guess it would -just be one byte/word different, maybe that would offer some idea what -wrote it. -Maybe one better way to do that is with the help of userfaultfd's write-protect -capability. It is still in the development by Andrea Arcangeli, but there -is a RFC version available, please refer to -http://www.spinics.net/lists/linux-mm/msg97422.html -ï¼I'm developing live memory snapshot which based on it, maybe this is another -scene where we -can use userfaultfd's WP ;) ). -Dave -Any idea... - -=================md5 check patch============================= - -diff --git a/Makefile.target b/Makefile.target -index 962d004..e2cb8e9 100644 ---- a/Makefile.target -+++ b/Makefile.target -@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o - obj-y += memory_mapping.o - obj-y += dump.o - obj-y += migration/ram.o migration/savevm.o --LIBS := $(libs_softmmu) $(LIBS) -+LIBS := $(libs_softmmu) $(LIBS) -lplumb - - # xen support - obj-$(CONFIG_XEN) += xen-common.o -diff --git a/migration/ram.c b/migration/ram.c -index 1eb155a..3b7a09d 100644 ---- a/migration/ram.c -+++ b/migration/ram.c -@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int -version_id) - } - - rcu_read_unlock(); -- DPRINTF("Completed load of VM with exit code %d seq iteration " -+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " - "%" PRIu64 "\n", ret, seq_iter); - return ret; - } -diff --git a/migration/savevm.c b/migration/savevm.c -index 0ad1b93..3feaa61 100644 ---- a/migration/savevm.c -+++ b/migration/savevm.c -@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) - - } - -+#include "exec/ram_addr.h" -+#include "qemu/rcu_queue.h" -+#include <clplumbing/md5.h> -+#ifndef MD5_DIGEST_LENGTH -+#define MD5_DIGEST_LENGTH 16 -+#endif -+ -+static void check_host_md5(void) -+{ -+ int i; -+ unsigned char md[MD5_DIGEST_LENGTH]; -+ rcu_read_lock(); -+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check -'pc.ram' block */ -+ rcu_read_unlock(); -+ -+ MD5(block->host, block->used_length, md); -+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { -+ fprintf(stderr, "%02x", md[i]); -+ } -+ fprintf(stderr, "\n"); -+ error_report("end ram md5"); -+} -+ - void qemu_savevm_state_begin(QEMUFile *f, - const MigrationParams *params) - { -@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, -bool iterable_only) - save_section_header(f, se, QEMU_VM_SECTION_END); - - ret = se->ops->save_live_complete_precopy(f, se->opaque); -+ -+ fprintf(stderr, "after saving %s complete\n", se->idstr); -+ check_host_md5(); -+ - trace_savevm_section_end(se->idstr, se->section_id, ret); - save_section_footer(f, se); - if (ret < 0) { -@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, -MigrationIncomingState *mis) - section_id, le->se->idstr); - return ret; - } -+ if (section_type == QEMU_VM_SECTION_END) { -+ error_report("after loading state section id %d(%s)", -+ section_id, le->se->idstr); -+ check_host_md5(); -+ } - if (!check_section_footer(f, le)) { - return -EINVAL; - } -@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) - } - - cpu_synchronize_all_post_init(); -+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); -+ check_host_md5(); - - return ret; - } --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - -. - -On 12/03/2015 05:37 PM, Hailiang Zhang wrote: -On 2015/12/3 17:24, Dr. David Alan Gilbert wrote: -* Li Zhijian (address@hidden) wrote: -Hi all, - -Does anyboday remember the similar issue post by hailiang months ago -http://patchwork.ozlabs.org/patch/454322/ -At least tow bugs about migration had been fixed since that. -Yes, I wondered what happened to that. -And now we found the same issue at the tcg vm(kvm is fine), after -migration, -the content VM's memory is inconsistent. -Hmm, TCG only - I don't know much about that; but I guess something must -be accessing memory without using the proper macros/functions so -it doesn't mark it as dirty. -we add a patch to check memory content, you can find it from affix - -steps to reporduce: -1) apply the patch and re-build qemu -2) prepare the ubuntu guest and run memtest in grub. -soruce side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 - --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off - -destination side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 - --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 - -3) start migration -with 1000M NIC, migration will finish within 3 min. - -at source: -(qemu) migrate tcp:192.168.2.66:8881 -after saving ram complete -e9e725df678d392b1a83b3a917f332bb -qemu-system-x86_64: end ram md5 -(qemu) - -at destination: -...skip... -Completed load of VM with exit code 0 seq iteration 1264 -Completed load of VM with exit code 0 seq iteration 1265 -Completed load of VM with exit code 0 seq iteration 1266 -qemu-system-x86_64: after loading state section id 2(ram) -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 -qemu-system-x86_64: qemu_loadvm_state: after -cpu_synchronize_all_post_init - -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 - -This occurs occasionally and only at tcg machine. It seems that -some pages dirtied in source side don't transferred to destination. -This problem can be reproduced even if we disable virtio. - -Is it OK for some pages that not transferred to destination when do -migration ? Or is it a bug? -I'm pretty sure that means it's a bug. Hard to find though, I guess -at least memtest is smaller than a big OS. I think I'd dump the whole -of memory on both sides, hexdump and diff them - I'd guess it would -just be one byte/word different, maybe that would offer some idea what -wrote it. -Maybe one better way to do that is with the help of userfaultfd's -write-protect -capability. It is still in the development by Andrea Arcangeli, but there -is a RFC version available, please refer to -http://www.spinics.net/lists/linux-mm/msg97422.html -ï¼I'm developing live memory snapshot which based on it, maybe this is -another scene where we -can use userfaultfd's WP ;) ). -sounds good. - -thanks -Li -Dave -Any idea... - -=================md5 check patch============================= - -diff --git a/Makefile.target b/Makefile.target -index 962d004..e2cb8e9 100644 ---- a/Makefile.target -+++ b/Makefile.target -@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o - obj-y += memory_mapping.o - obj-y += dump.o - obj-y += migration/ram.o migration/savevm.o --LIBS := $(libs_softmmu) $(LIBS) -+LIBS := $(libs_softmmu) $(LIBS) -lplumb - - # xen support - obj-$(CONFIG_XEN) += xen-common.o -diff --git a/migration/ram.c b/migration/ram.c -index 1eb155a..3b7a09d 100644 ---- a/migration/ram.c -+++ b/migration/ram.c -@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int -version_id) - } - - rcu_read_unlock(); -- DPRINTF("Completed load of VM with exit code %d seq iteration " -+ fprintf(stderr, "Completed load of VM with exit code %d seq -iteration " - "%" PRIu64 "\n", ret, seq_iter); - return ret; - } -diff --git a/migration/savevm.c b/migration/savevm.c -index 0ad1b93..3feaa61 100644 ---- a/migration/savevm.c -+++ b/migration/savevm.c -@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) - - } - -+#include "exec/ram_addr.h" -+#include "qemu/rcu_queue.h" -+#include <clplumbing/md5.h> -+#ifndef MD5_DIGEST_LENGTH -+#define MD5_DIGEST_LENGTH 16 -+#endif -+ -+static void check_host_md5(void) -+{ -+ int i; -+ unsigned char md[MD5_DIGEST_LENGTH]; -+ rcu_read_lock(); -+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check -'pc.ram' block */ -+ rcu_read_unlock(); -+ -+ MD5(block->host, block->used_length, md); -+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { -+ fprintf(stderr, "%02x", md[i]); -+ } -+ fprintf(stderr, "\n"); -+ error_report("end ram md5"); -+} -+ - void qemu_savevm_state_begin(QEMUFile *f, - const MigrationParams *params) - { -@@ -1056,6 +1079,10 @@ void -qemu_savevm_state_complete_precopy(QEMUFile *f, -bool iterable_only) - save_section_header(f, se, QEMU_VM_SECTION_END); - - ret = se->ops->save_live_complete_precopy(f, se->opaque); -+ -+ fprintf(stderr, "after saving %s complete\n", se->idstr); -+ check_host_md5(); -+ - trace_savevm_section_end(se->idstr, se->section_id, ret); - save_section_footer(f, se); - if (ret < 0) { -@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, -MigrationIncomingState *mis) - section_id, le->se->idstr); - return ret; - } -+ if (section_type == QEMU_VM_SECTION_END) { -+ error_report("after loading state section id %d(%s)", -+ section_id, le->se->idstr); -+ check_host_md5(); -+ } - if (!check_section_footer(f, le)) { - return -EINVAL; - } -@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) - } - - cpu_synchronize_all_post_init(); -+ error_report("%s: after cpu_synchronize_all_post_init\n", -__func__); -+ check_host_md5(); - - return ret; - } --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - -. -. --- -Best regards. -Li Zhijian (8555) - -On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote: -* Li Zhijian (address@hidden) wrote: -Hi all, - -Does anyboday remember the similar issue post by hailiang months ago -http://patchwork.ozlabs.org/patch/454322/ -At least tow bugs about migration had been fixed since that. -Yes, I wondered what happened to that. -And now we found the same issue at the tcg vm(kvm is fine), after migration, -the content VM's memory is inconsistent. -Hmm, TCG only - I don't know much about that; but I guess something must -be accessing memory without using the proper macros/functions so -it doesn't mark it as dirty. -we add a patch to check memory content, you can find it from affix - -steps to reporduce: -1) apply the patch and re-build qemu -2) prepare the ubuntu guest and run memtest in grub. -soruce side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off - -destination side: -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 - -3) start migration -with 1000M NIC, migration will finish within 3 min. - -at source: -(qemu) migrate tcp:192.168.2.66:8881 -after saving ram complete -e9e725df678d392b1a83b3a917f332bb -qemu-system-x86_64: end ram md5 -(qemu) - -at destination: -...skip... -Completed load of VM with exit code 0 seq iteration 1264 -Completed load of VM with exit code 0 seq iteration 1265 -Completed load of VM with exit code 0 seq iteration 1266 -qemu-system-x86_64: after loading state section id 2(ram) -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 -qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init - -49c2dac7bde0e5e22db7280dcb3824f9 -qemu-system-x86_64: end ram md5 - -This occurs occasionally and only at tcg machine. It seems that -some pages dirtied in source side don't transferred to destination. -This problem can be reproduced even if we disable virtio. - -Is it OK for some pages that not transferred to destination when do -migration ? Or is it a bug? -I'm pretty sure that means it's a bug. Hard to find though, I guess -at least memtest is smaller than a big OS. I think I'd dump the whole -of memory on both sides, hexdump and diff them - I'd guess it would -just be one byte/word different, maybe that would offer some idea what -wrote it. -I try to dump and compare them, more than 10 pages are different. -in source side, they are random value rather than always 'FF' 'FB' 'EF' -'BF'... in destination. -and not all of the different pages are continuous. - -thanks -Li -Dave -Any idea... - -=================md5 check patch============================= - -diff --git a/Makefile.target b/Makefile.target -index 962d004..e2cb8e9 100644 ---- a/Makefile.target -+++ b/Makefile.target -@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o - obj-y += memory_mapping.o - obj-y += dump.o - obj-y += migration/ram.o migration/savevm.o --LIBS := $(libs_softmmu) $(LIBS) -+LIBS := $(libs_softmmu) $(LIBS) -lplumb - - # xen support - obj-$(CONFIG_XEN) += xen-common.o -diff --git a/migration/ram.c b/migration/ram.c -index 1eb155a..3b7a09d 100644 ---- a/migration/ram.c -+++ b/migration/ram.c -@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int -version_id) - } - - rcu_read_unlock(); -- DPRINTF("Completed load of VM with exit code %d seq iteration " -+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " - "%" PRIu64 "\n", ret, seq_iter); - return ret; - } -diff --git a/migration/savevm.c b/migration/savevm.c -index 0ad1b93..3feaa61 100644 ---- a/migration/savevm.c -+++ b/migration/savevm.c -@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) - - } - -+#include "exec/ram_addr.h" -+#include "qemu/rcu_queue.h" -+#include <clplumbing/md5.h> -+#ifndef MD5_DIGEST_LENGTH -+#define MD5_DIGEST_LENGTH 16 -+#endif -+ -+static void check_host_md5(void) -+{ -+ int i; -+ unsigned char md[MD5_DIGEST_LENGTH]; -+ rcu_read_lock(); -+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check -'pc.ram' block */ -+ rcu_read_unlock(); -+ -+ MD5(block->host, block->used_length, md); -+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { -+ fprintf(stderr, "%02x", md[i]); -+ } -+ fprintf(stderr, "\n"); -+ error_report("end ram md5"); -+} -+ - void qemu_savevm_state_begin(QEMUFile *f, - const MigrationParams *params) - { -@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, -bool iterable_only) - save_section_header(f, se, QEMU_VM_SECTION_END); - - ret = se->ops->save_live_complete_precopy(f, se->opaque); -+ -+ fprintf(stderr, "after saving %s complete\n", se->idstr); -+ check_host_md5(); -+ - trace_savevm_section_end(se->idstr, se->section_id, ret); - save_section_footer(f, se); - if (ret < 0) { -@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, -MigrationIncomingState *mis) - section_id, le->se->idstr); - return ret; - } -+ if (section_type == QEMU_VM_SECTION_END) { -+ error_report("after loading state section id %d(%s)", -+ section_id, le->se->idstr); -+ check_host_md5(); -+ } - if (!check_section_footer(f, le)) { - return -EINVAL; - } -@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) - } - - cpu_synchronize_all_post_init(); -+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); -+ check_host_md5(); - - return ret; - } --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - - -. --- -Best regards. -Li Zhijian (8555) - -* Li Zhijian (address@hidden) wrote: -> -> -> -On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote: -> ->* Li Zhijian (address@hidden) wrote: -> ->>Hi all, -> ->> -> ->>Does anyboday remember the similar issue post by hailiang months ago -> ->> -http://patchwork.ozlabs.org/patch/454322/ -> ->>At least tow bugs about migration had been fixed since that. -> -> -> ->Yes, I wondered what happened to that. -> -> -> ->>And now we found the same issue at the tcg vm(kvm is fine), after migration, -> ->>the content VM's memory is inconsistent. -> -> -> ->Hmm, TCG only - I don't know much about that; but I guess something must -> ->be accessing memory without using the proper macros/functions so -> ->it doesn't mark it as dirty. -> -> -> ->>we add a patch to check memory content, you can find it from affix -> ->> -> ->>steps to reporduce: -> ->>1) apply the patch and re-build qemu -> ->>2) prepare the ubuntu guest and run memtest in grub. -> ->>soruce side: -> ->>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -> ->>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -> ->>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -> ->>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -> ->>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -> ->>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -> ->>pc-i440fx-2.3,accel=tcg,usb=off -> ->> -> ->>destination side: -> ->>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -> ->>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -> ->>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -> ->>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -> ->>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -> ->>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -> ->>pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 -> ->> -> ->>3) start migration -> ->>with 1000M NIC, migration will finish within 3 min. -> ->> -> ->>at source: -> ->>(qemu) migrate tcp:192.168.2.66:8881 -> ->>after saving ram complete -> ->>e9e725df678d392b1a83b3a917f332bb -> ->>qemu-system-x86_64: end ram md5 -> ->>(qemu) -> ->> -> ->>at destination: -> ->>...skip... -> ->>Completed load of VM with exit code 0 seq iteration 1264 -> ->>Completed load of VM with exit code 0 seq iteration 1265 -> ->>Completed load of VM with exit code 0 seq iteration 1266 -> ->>qemu-system-x86_64: after loading state section id 2(ram) -> ->>49c2dac7bde0e5e22db7280dcb3824f9 -> ->>qemu-system-x86_64: end ram md5 -> ->>qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init -> ->> -> ->>49c2dac7bde0e5e22db7280dcb3824f9 -> ->>qemu-system-x86_64: end ram md5 -> ->> -> ->>This occurs occasionally and only at tcg machine. It seems that -> ->>some pages dirtied in source side don't transferred to destination. -> ->>This problem can be reproduced even if we disable virtio. -> ->> -> ->>Is it OK for some pages that not transferred to destination when do -> ->>migration ? Or is it a bug? -> -> -> ->I'm pretty sure that means it's a bug. Hard to find though, I guess -> ->at least memtest is smaller than a big OS. I think I'd dump the whole -> ->of memory on both sides, hexdump and diff them - I'd guess it would -> ->just be one byte/word different, maybe that would offer some idea what -> ->wrote it. -> -> -I try to dump and compare them, more than 10 pages are different. -> -in source side, they are random value rather than always 'FF' 'FB' 'EF' -> -'BF'... in destination. -> -> -and not all of the different pages are continuous. -I wonder if it happens on all of memtest's different test patterns, -perhaps it might be possible to narrow it down if you tell memtest -to only run one test at a time. - -Dave - -> -> -thanks -> -Li -> -> -> -> -> ->Dave -> -> -> ->>Any idea... -> ->> -> ->>=================md5 check patch============================= -> ->> -> ->>diff --git a/Makefile.target b/Makefile.target -> ->>index 962d004..e2cb8e9 100644 -> ->>--- a/Makefile.target -> ->>+++ b/Makefile.target -> ->>@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o -> ->> obj-y += memory_mapping.o -> ->> obj-y += dump.o -> ->> obj-y += migration/ram.o migration/savevm.o -> ->>-LIBS := $(libs_softmmu) $(LIBS) -> ->>+LIBS := $(libs_softmmu) $(LIBS) -lplumb -> ->> -> ->> # xen support -> ->> obj-$(CONFIG_XEN) += xen-common.o -> ->>diff --git a/migration/ram.c b/migration/ram.c -> ->>index 1eb155a..3b7a09d 100644 -> ->>--- a/migration/ram.c -> ->>+++ b/migration/ram.c -> ->>@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int -> ->>version_id) -> ->> } -> ->> -> ->> rcu_read_unlock(); -> ->>- DPRINTF("Completed load of VM with exit code %d seq iteration " -> ->>+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " -> ->> "%" PRIu64 "\n", ret, seq_iter); -> ->> return ret; -> ->> } -> ->>diff --git a/migration/savevm.c b/migration/savevm.c -> ->>index 0ad1b93..3feaa61 100644 -> ->>--- a/migration/savevm.c -> ->>+++ b/migration/savevm.c -> ->>@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) -> ->> -> ->> } -> ->> -> ->>+#include "exec/ram_addr.h" -> ->>+#include "qemu/rcu_queue.h" -> ->>+#include <clplumbing/md5.h> -> ->>+#ifndef MD5_DIGEST_LENGTH -> ->>+#define MD5_DIGEST_LENGTH 16 -> ->>+#endif -> ->>+ -> ->>+static void check_host_md5(void) -> ->>+{ -> ->>+ int i; -> ->>+ unsigned char md[MD5_DIGEST_LENGTH]; -> ->>+ rcu_read_lock(); -> ->>+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check -> ->>'pc.ram' block */ -> ->>+ rcu_read_unlock(); -> ->>+ -> ->>+ MD5(block->host, block->used_length, md); -> ->>+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { -> ->>+ fprintf(stderr, "%02x", md[i]); -> ->>+ } -> ->>+ fprintf(stderr, "\n"); -> ->>+ error_report("end ram md5"); -> ->>+} -> ->>+ -> ->> void qemu_savevm_state_begin(QEMUFile *f, -> ->> const MigrationParams *params) -> ->> { -> ->>@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, -> ->>bool iterable_only) -> ->> save_section_header(f, se, QEMU_VM_SECTION_END); -> ->> -> ->> ret = se->ops->save_live_complete_precopy(f, se->opaque); -> ->>+ -> ->>+ fprintf(stderr, "after saving %s complete\n", se->idstr); -> ->>+ check_host_md5(); -> ->>+ -> ->> trace_savevm_section_end(se->idstr, se->section_id, ret); -> ->> save_section_footer(f, se); -> ->> if (ret < 0) { -> ->>@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, -> ->>MigrationIncomingState *mis) -> ->> section_id, le->se->idstr); -> ->> return ret; -> ->> } -> ->>+ if (section_type == QEMU_VM_SECTION_END) { -> ->>+ error_report("after loading state section id %d(%s)", -> ->>+ section_id, le->se->idstr); -> ->>+ check_host_md5(); -> ->>+ } -> ->> if (!check_section_footer(f, le)) { -> ->> return -EINVAL; -> ->> } -> ->>@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) -> ->> } -> ->> -> ->> cpu_synchronize_all_post_init(); -> ->>+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); -> ->>+ check_host_md5(); -> ->> -> ->> return ret; -> ->> } -> ->> -> ->> -> ->> -> ->-- -> ->Dr. David Alan Gilbert / address@hidden / Manchester, UK -> -> -> -> -> ->. -> -> -> -> --- -> -Best regards. -> -Li Zhijian (8555) -> -> --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - -Li Zhijian <address@hidden> wrote: -> -Hi all, -> -> -Does anyboday remember the similar issue post by hailiang months ago -> -http://patchwork.ozlabs.org/patch/454322/ -> -At least tow bugs about migration had been fixed since that. -> -> -And now we found the same issue at the tcg vm(kvm is fine), after -> -migration, the content VM's memory is inconsistent. -> -> -we add a patch to check memory content, you can find it from affix -> -> -steps to reporduce: -> -1) apply the patch and re-build qemu -> -2) prepare the ubuntu guest and run memtest in grub. -> -soruce side: -> -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -> -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -> -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -> -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -> --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -> -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -> -pc-i440fx-2.3,accel=tcg,usb=off -> -> -destination side: -> -x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device -> -e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive -> -if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device -> -virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -> --vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp -> -tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine -> -pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 -> -> -3) start migration -> -with 1000M NIC, migration will finish within 3 min. -> -> -at source: -> -(qemu) migrate tcp:192.168.2.66:8881 -> -after saving ram complete -> -e9e725df678d392b1a83b3a917f332bb -> -qemu-system-x86_64: end ram md5 -> -(qemu) -> -> -at destination: -> -...skip... -> -Completed load of VM with exit code 0 seq iteration 1264 -> -Completed load of VM with exit code 0 seq iteration 1265 -> -Completed load of VM with exit code 0 seq iteration 1266 -> -qemu-system-x86_64: after loading state section id 2(ram) -> -49c2dac7bde0e5e22db7280dcb3824f9 -> -qemu-system-x86_64: end ram md5 -> -qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init -> -> -49c2dac7bde0e5e22db7280dcb3824f9 -> -qemu-system-x86_64: end ram md5 -> -> -This occurs occasionally and only at tcg machine. It seems that -> -some pages dirtied in source side don't transferred to destination. -> -This problem can be reproduced even if we disable virtio. -> -> -Is it OK for some pages that not transferred to destination when do -> -migration ? Or is it a bug? -> -> -Any idea... -Thanks for describing how to reproduce the bug. -If some pages are not transferred to destination then it is a bug, so we -need to know what the problem is, notice that the problem can be that -TCG is not marking dirty some page, that Migration code "forgets" about -that page, or anything eles altogether, that is what we need to find. - -There are more posibilities, I am not sure that memtest is on 32bit -mode, and it is inside posibility that we are missing some state when we -are on real mode. - -Will try to take a look at this. - -THanks, again. - - -> -> -=================md5 check patch============================= -> -> -diff --git a/Makefile.target b/Makefile.target -> -index 962d004..e2cb8e9 100644 -> ---- a/Makefile.target -> -+++ b/Makefile.target -> -@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o -> -obj-y += memory_mapping.o -> -obj-y += dump.o -> -obj-y += migration/ram.o migration/savevm.o -> --LIBS := $(libs_softmmu) $(LIBS) -> -+LIBS := $(libs_softmmu) $(LIBS) -lplumb -> -> -# xen support -> -obj-$(CONFIG_XEN) += xen-common.o -> -diff --git a/migration/ram.c b/migration/ram.c -> -index 1eb155a..3b7a09d 100644 -> ---- a/migration/ram.c -> -+++ b/migration/ram.c -> -@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, -> -int version_id) -> -} -> -> -rcu_read_unlock(); -> -- DPRINTF("Completed load of VM with exit code %d seq iteration " -> -+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " -> -"%" PRIu64 "\n", ret, seq_iter); -> -return ret; -> -} -> -diff --git a/migration/savevm.c b/migration/savevm.c -> -index 0ad1b93..3feaa61 100644 -> ---- a/migration/savevm.c -> -+++ b/migration/savevm.c -> -@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) -> -> -} -> -> -+#include "exec/ram_addr.h" -> -+#include "qemu/rcu_queue.h" -> -+#include <clplumbing/md5.h> -> -+#ifndef MD5_DIGEST_LENGTH -> -+#define MD5_DIGEST_LENGTH 16 -> -+#endif -> -+ -> -+static void check_host_md5(void) -> -+{ -> -+ int i; -> -+ unsigned char md[MD5_DIGEST_LENGTH]; -> -+ rcu_read_lock(); -> -+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check -> -'pc.ram' block */ -> -+ rcu_read_unlock(); -> -+ -> -+ MD5(block->host, block->used_length, md); -> -+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { -> -+ fprintf(stderr, "%02x", md[i]); -> -+ } -> -+ fprintf(stderr, "\n"); -> -+ error_report("end ram md5"); -> -+} -> -+ -> -void qemu_savevm_state_begin(QEMUFile *f, -> -const MigrationParams *params) -> -{ -> -@@ -1056,6 +1079,10 @@ void -> -qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only) -> -save_section_header(f, se, QEMU_VM_SECTION_END); -> -> -ret = se->ops->save_live_complete_precopy(f, se->opaque); -> -+ -> -+ fprintf(stderr, "after saving %s complete\n", se->idstr); -> -+ check_host_md5(); -> -+ -> -trace_savevm_section_end(se->idstr, se->section_id, ret); -> -save_section_footer(f, se); -> -if (ret < 0) { -> -@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, -> -MigrationIncomingState *mis) -> -section_id, le->se->idstr); -> -return ret; -> -} -> -+ if (section_type == QEMU_VM_SECTION_END) { -> -+ error_report("after loading state section id %d(%s)", -> -+ section_id, le->se->idstr); -> -+ check_host_md5(); -> -+ } -> -if (!check_section_footer(f, le)) { -> -return -EINVAL; -> -} -> -@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) -> -} -> -> -cpu_synchronize_all_post_init(); -> -+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); -> -+ check_host_md5(); -> -> -return ret; -> -} - -> -> -Thanks for describing how to reproduce the bug. -> -If some pages are not transferred to destination then it is a bug, so we need -> -to know what the problem is, notice that the problem can be that TCG is not -> -marking dirty some page, that Migration code "forgets" about that page, or -> -anything eles altogether, that is what we need to find. -> -> -There are more posibilities, I am not sure that memtest is on 32bit mode, and -> -it is inside posibility that we are missing some state when we are on real -> -mode. -> -> -Will try to take a look at this. -> -> -THanks, again. -> -Hi Juan & Amit - - Do you think we should add a mechanism to check the data integrity during LM -like Zhijian's patch did? it may be very helpful for developers. - Actually, I did the similar thing before in order to make sure that I did the -right thing we I change the code related to LM. - -Liang - -On (Fri) 04 Dec 2015 [01:43:07], Li, Liang Z wrote: -> -> -> -> Thanks for describing how to reproduce the bug. -> -> If some pages are not transferred to destination then it is a bug, so we -> -> need -> -> to know what the problem is, notice that the problem can be that TCG is not -> -> marking dirty some page, that Migration code "forgets" about that page, or -> -> anything eles altogether, that is what we need to find. -> -> -> -> There are more posibilities, I am not sure that memtest is on 32bit mode, -> -> and -> -> it is inside posibility that we are missing some state when we are on real -> -> mode. -> -> -> -> Will try to take a look at this. -> -> -> -> THanks, again. -> -> -> -> -Hi Juan & Amit -> -> -Do you think we should add a mechanism to check the data integrity during LM -> -like Zhijian's patch did? it may be very helpful for developers. -> -Actually, I did the similar thing before in order to make sure that I did -> -the right thing we I change the code related to LM. -If you mean for debugging, something that's not always on, then I'm -fine with it. - -A script that goes along that shows the result of comparison of the -diff will be helpful too, something that shows how many pages are -differnt, how many bytes in a page on average, and so on. - - Amit - diff --git a/results/classifier/016/virtual/79834768 b/results/classifier/016/virtual/79834768 deleted file mode 100644 index 16bb6142..00000000 --- a/results/classifier/016/virtual/79834768 +++ /dev/null @@ -1,436 +0,0 @@ -virtual: 0.981 -KVM: 0.958 -debug: 0.901 -x86: 0.830 -operating system: 0.687 -hypervisor: 0.440 -kernel: 0.177 -register: 0.064 -performance: 0.042 -user-level: 0.026 -i386: 0.021 -assembly: 0.018 -files: 0.016 -VMM: 0.010 -PID: 0.010 -device: 0.009 -TCG: 0.008 -socket: 0.008 -semantic: 0.008 -architecture: 0.006 -vnc: 0.006 -risc-v: 0.006 -arm: 0.006 -peripherals: 0.004 -graphic: 0.003 -permissions: 0.003 -network: 0.003 -ppc: 0.002 -alpha: 0.002 -boot: 0.002 -mistranslation: 0.002 - -[Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application - -Hiï¼ - -We hit a bug in our test while run PCMark 10 in a windows 7 VM, -The VM got stuck and the wallclock was hang after several minutes running -PCMark 10 in it. -It is quite easily to reproduce the bug with the upstream KVM and Qemu. - -We found that KVM can not inject any RTC irq to VM after it was hang, it fails -to -Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. - -static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, - int irq_level, bool line_status) -{ -⦠⦠- if (!irq_level) { - ioapic->irr &= ~mask; - ret = 1; - goto out; - } -⦠⦠- if ((edge && old_irr == ioapic->irr) || - (!edge && entry.fields.remote_irr)) { - ret = 0; - goto out; - } - -According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs -register C to to clear the irq flag, and pull down the irq electric pin. - -For Qemu, we will emulate the reading operation in cmos_ioport_read(), -but Guest OS will fire a write operation before to tell which register will be -read -after this write, where we use s->cmos_index to record the following register -to read. - -But in our test, we found that there is a possible situation that Vcpu fails to -read -RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading -registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, -so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, -but before it tries to read register C, another vcpu1 is going to read RTC_YEAR, -it changes s->cmos_index to RTC_YEAR by a writing action. -The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we -will miss -calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never -inject RTC irq, -and Windows VM will hang. -static void cmos_ioport_write(void *opaque, hwaddr addr, - uint64_t data, unsigned size) -{ - RTCState *s = opaque; - - if ((addr & 1) == 0) { - s->cmos_index = data & 0x7f; - } -â¦â¦ -static uint64_t cmos_ioport_read(void *opaque, hwaddr addr, - unsigned size) -{ - RTCState *s = opaque; - int ret; - if ((addr & 1) == 0) { - return 0xff; - } else { - switch(s->cmos_index) { - -According to CMOS spec, âany write to PROT 0070h should be followed by an -action to PROT 0071h or the RTC -Will be RTC will be left in an unknown stateâ, but it seems that we can not -ensure this sequence in qemu/kvm. - -Any ideas ? - -Thanks, -Hailiang - -Pls see the trace of kvm_pio: - - CPU 1/KVM-15567 [003] .... 209311.762579: kvm_pio: pio_read at 0x70 size -1 count 1 val 0xff - CPU 1/KVM-15567 [003] .... 209311.762582: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x89 - CPU 1/KVM-15567 [003] .... 209311.762590: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x17 - CPU 0/KVM-15566 [005] .... 209311.762611: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0xc - CPU 1/KVM-15567 [003] .... 209311.762615: kvm_pio: pio_read at 0x70 size -1 count 1 val 0xff - CPU 1/KVM-15567 [003] .... 209311.762619: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x88 - CPU 1/KVM-15567 [003] .... 209311.762627: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x12 - CPU 0/KVM-15566 [005] .... 209311.762632: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x12 - CPU 1/KVM-15567 [003] .... 209311.762633: kvm_pio: pio_read at 0x70 size -1 count 1 val 0xff - CPU 0/KVM-15566 [005] .... 209311.762634: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0xc <--- Firstly write to 0x70, cmo_index = 0xc & -0x7f = 0xc - CPU 1/KVM-15567 [003] .... 209311.762636: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x86 <-- Secondly write to 0x70, cmo_index = 0x86 & -0x7f = 0x6, cover the cmo_index result of first time - CPU 0/KVM-15566 [005] .... 209311.762641: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x6 <-- vcpu0 read 0x6 because cmo_index is 0x6 now - CPU 1/KVM-15567 [003] .... 209311.762644: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x6 <- vcpu1 read 0x6 - CPU 1/KVM-15567 [003] .... 209311.762649: kvm_pio: pio_read at 0x70 size -1 count 1 val 0xff - CPU 1/KVM-15567 [003] .... 209311.762669: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x87 - CPU 1/KVM-15567 [003] .... 209311.762678: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x1 - CPU 1/KVM-15567 [003] .... 209311.762683: kvm_pio: pio_read at 0x70 size -1 count 1 val 0xff - CPU 1/KVM-15567 [003] .... 209311.762686: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x84 - CPU 1/KVM-15567 [003] .... 209311.762693: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x10 - CPU 1/KVM-15567 [003] .... 209311.762699: kvm_pio: pio_read at 0x70 size -1 count 1 val 0xff - CPU 1/KVM-15567 [003] .... 209311.762702: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x82 - CPU 1/KVM-15567 [003] .... 209311.762709: kvm_pio: pio_read at 0x71 size -1 count 1 val 0x25 - CPU 1/KVM-15567 [003] .... 209311.762714: kvm_pio: pio_read at 0x70 size -1 count 1 val 0xff - CPU 1/KVM-15567 [003] .... 209311.762717: kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x80 - - -Regards, --Gonglei - -From: Zhanghailiang -Sent: Friday, December 01, 2017 3:03 AM -To: address@hidden; address@hidden; Paolo Bonzini -Cc: Huangweidong (C); Gonglei (Arei); wangxin (U); Xiexiangyou -Subject: [BUG] Windows 7 got stuck easily while run PCMark10 application - -Hiï¼ - -We hit a bug in our test while run PCMark 10 in a windows 7 VM, -The VM got stuck and the wallclock was hang after several minutes running -PCMark 10 in it. -It is quite easily to reproduce the bug with the upstream KVM and Qemu. - -We found that KVM can not inject any RTC irq to VM after it was hang, it fails -to -Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. - -static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, - int irq_level, bool line_status) -{ -⦠⦠- if (!irq_level) { - ioapic->irr &= ~mask; - ret = 1; - goto out; - } -⦠⦠- if ((edge && old_irr == ioapic->irr) || - (!edge && entry.fields.remote_irr)) { - ret = 0; - goto out; - } - -According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs -register C to to clear the irq flag, and pull down the irq electric pin. - -For Qemu, we will emulate the reading operation in cmos_ioport_read(), -but Guest OS will fire a write operation before to tell which register will be -read -after this write, where we use s->cmos_index to record the following register -to read. - -But in our test, we found that there is a possible situation that Vcpu fails to -read -RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading -registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, -so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, -but before it tries to read register C, another vcpu1 is going to read RTC_YEAR, -it changes s->cmos_index to RTC_YEAR by a writing action. -The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we -will miss -calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never -inject RTC irq, -and Windows VM will hang. -static void cmos_ioport_write(void *opaque, hwaddr addr, - uint64_t data, unsigned size) -{ - RTCState *s = opaque; - - if ((addr & 1) == 0) { - s->cmos_index = data & 0x7f; - } -â¦â¦ -static uint64_t cmos_ioport_read(void *opaque, hwaddr addr, - unsigned size) -{ - RTCState *s = opaque; - int ret; - if ((addr & 1) == 0) { - return 0xff; - } else { - switch(s->cmos_index) { - -According to CMOS spec, âany write to PROT 0070h should be followed by an -action to PROT 0071h or the RTC -Will be RTC will be left in an unknown stateâ, but it seems that we can not -ensure this sequence in qemu/kvm. - -Any ideas ? - -Thanks, -Hailiang - -On 01/12/2017 08:08, Gonglei (Arei) wrote: -> -First write to 0x70, cmos_index = 0xc & 0x7f = 0xc -> -      CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> -> -Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>       CPU 1/KVM-15567 -> -kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because -> -cmos_index is 0x6 now:>       CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size -> -1 count 1 val 0x6> vcpu1 read 0x6:>       CPU 1/KVM-15567 kvm_pio: pio_read -> -at 0x71 size 1 count 1 val 0x6 -This seems to be a Windows bug. The easiest workaround that I -can think of is to clear the interrupts already when 0xc is written, -without waiting for the read (because REG_C can only be read). - -What do you think? - -Thanks, - -Paolo - -I also think it's windows bug, the problem is that it doesn't occur on xen -platform. And there are some other works need to be done while reading REG_C. -So I wrote that patch. - -Thanks, -Gonglei -å件人ï¼Paolo Bonzini -æ¶ä»¶äººï¼é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin -æéï¼é»ä¼æ ,çæ¬£,谢祥æ -æ¶é´ï¼2017-12-02 01:10:08 -主é¢:Re: [BUG] Windows 7 got stuck easily while run PCMark10 application - -On 01/12/2017 08:08, Gonglei (Arei) wrote: -> -First write to 0x70, cmos_index = 0xc & 0x7f = 0xc -> -CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> -> -Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6> CPU 1/KVM-15567 -> -kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because -> -cmos_index is 0x6 now:> CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size -> -1 count 1 val 0x6> vcpu1 read 0x6:> CPU 1/KVM-15567 kvm_pio: pio_read -> -at 0x71 size 1 count 1 val 0x6 -This seems to be a Windows bug. The easiest workaround that I -can think of is to clear the interrupts already when 0xc is written, -without waiting for the read (because REG_C can only be read). - -What do you think? - -Thanks, - -Paolo - -On 01/12/2017 18:45, Gonglei (Arei) wrote: -> -I also think it's windows bug, the problem is that it doesn't occur on -> -xen platform. -It's a race, it may just be that RTC PIO is faster in Xen because it's -implemented in the hypervisor. - -I will try reporting it to Microsoft. - -Thanks, - -Paolo - -> -Thanks, -> -Gonglei -> -*å件人ï¼*Paolo Bonzini -> -*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin -> -*æéï¼*é»ä¼æ ,çæ¬£,谢祥æ -> -*æ¶é´ï¼*2017-12-02 01:10:08 -> -*主é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application -> -> -On 01/12/2017 08:08, Gonglei (Arei) wrote: -> -> First write to 0x70, cmos_index = 0xc & 0x7f = 0xc -> ->       CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> -> -> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>       CPU 1/KVM-15567 -> -> kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because -> -> cmos_index is 0x6 now:>       CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 -> -> size 1 count 1 val 0x6> vcpu1 -> -read 0x6:>       CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count -> -1 val 0x6 -> -This seems to be a Windows bug. The easiest workaround that I -> -can think of is to clear the interrupts already when 0xc is written, -> -without waiting for the read (because REG_C can only be read). -> -> -What do you think? -> -> -Thanks, -> -> -Paolo - -On 2017/12/2 2:37, Paolo Bonzini wrote: -On 01/12/2017 18:45, Gonglei (Arei) wrote: -I also think it's windows bug, the problem is that it doesn't occur on -xen platform. -It's a race, it may just be that RTC PIO is faster in Xen because it's -implemented in the hypervisor. -No, In Xen, it does not has such problem because it injects the RTC irq without -checking whether its previous irq been cleared or not, which we do has such -checking -in KVM. - -static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, - int irq_level, bool line_status) -{ - ... ... - if (!irq_level) { - ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not -inject RTC irq. - ret = 1; - goto out; - } - -I agree that we move the operation of clearing RTC irq from cmos_ioport_read() -to -cmos_ioport_write() to ensure the action been done. - -Thanks, -Hailiang -I will try reporting it to Microsoft. - -Thanks, - -Paolo -Thanks, -Gonglei -*å件人ï¼*Paolo Bonzini -*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin -*æéï¼*é»ä¼æ ,çæ¬£,谢祥æ -*æ¶é´ï¼*2017-12-02 01:10:08 -*主é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application - -On 01/12/2017 08:08, Gonglei (Arei) wrote: -First write to 0x70, cmos_index = 0xc & 0x7f = 0xc - CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to -0x70, cmos_index = 0x86 & 0x7f = 0x6> CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 -size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:> CPU -0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1 -read 0x6:> CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count -1 val 0x6 -This seems to be a Windows bug. The easiest workaround that I -can think of is to clear the interrupts already when 0xc is written, -without waiting for the read (because REG_C can only be read). - -What do you think? - -Thanks, - -Paolo -. - |