From e5634e2806195bee44407853c4bf8776f7abfa4f Mon Sep 17 00:00:00 2001
From: Christian Krinitsin <mail@krinitsin.com>
Date: Sun, 1 Jun 2025 21:19:55 +0200
Subject: add the outputs of the first five revisions of the classifier

---
 classification_output/01/mistranslation/0247400  | 1486 -------
 classification_output/01/mistranslation/1267916  | 1878 ---------
 classification_output/01/mistranslation/14887122 |  258 ++
 classification_output/01/mistranslation/1693040  | 1061 -----
 classification_output/01/mistranslation/22219210 |   43 +
 classification_output/01/mistranslation/23270873 |  692 ++++
 classification_output/01/mistranslation/24930826 |   33 +
 classification_output/01/mistranslation/25842545 |  202 +
 classification_output/01/mistranslation/26430026 |  165 +
 classification_output/01/mistranslation/36568044 | 4581 ++++++++++++++++++++++
 classification_output/01/mistranslation/3886413  |   33 -
 classification_output/01/mistranslation/4158985  | 1480 -------
 classification_output/01/mistranslation/4412535  |  348 --
 classification_output/01/mistranslation/5373318  |  692 ----
 classification_output/01/mistranslation/5798945  |   43 -
 classification_output/01/mistranslation/5933279  | 4581 ----------------------
 classification_output/01/mistranslation/6178292  |  258 --
 classification_output/01/mistranslation/64322995 |   54 +
 classification_output/01/mistranslation/6866700  |   54 -
 classification_output/01/mistranslation/70294255 | 1061 +++++
 classification_output/01/mistranslation/71456293 | 1486 +++++++
 classification_output/01/mistranslation/74466963 | 1878 +++++++++
 classification_output/01/mistranslation/74545755 |  344 ++
 classification_output/01/mistranslation/7711787  |  165 -
 classification_output/01/mistranslation/80604314 | 1480 +++++++
 classification_output/01/mistranslation/80615920 |  348 ++
 classification_output/01/mistranslation/8720260  |  344 --
 classification_output/01/mistranslation/8874178  |  202 -
 28 files changed, 12625 insertions(+), 12625 deletions(-)
 delete mode 100644 classification_output/01/mistranslation/0247400
 delete mode 100644 classification_output/01/mistranslation/1267916
 create mode 100644 classification_output/01/mistranslation/14887122
 delete mode 100644 classification_output/01/mistranslation/1693040
 create mode 100644 classification_output/01/mistranslation/22219210
 create mode 100644 classification_output/01/mistranslation/23270873
 create mode 100644 classification_output/01/mistranslation/24930826
 create mode 100644 classification_output/01/mistranslation/25842545
 create mode 100644 classification_output/01/mistranslation/26430026
 create mode 100644 classification_output/01/mistranslation/36568044
 delete mode 100644 classification_output/01/mistranslation/3886413
 delete mode 100644 classification_output/01/mistranslation/4158985
 delete mode 100644 classification_output/01/mistranslation/4412535
 delete mode 100644 classification_output/01/mistranslation/5373318
 delete mode 100644 classification_output/01/mistranslation/5798945
 delete mode 100644 classification_output/01/mistranslation/5933279
 delete mode 100644 classification_output/01/mistranslation/6178292
 create mode 100644 classification_output/01/mistranslation/64322995
 delete mode 100644 classification_output/01/mistranslation/6866700
 create mode 100644 classification_output/01/mistranslation/70294255
 create mode 100644 classification_output/01/mistranslation/71456293
 create mode 100644 classification_output/01/mistranslation/74466963
 create mode 100644 classification_output/01/mistranslation/74545755
 delete mode 100644 classification_output/01/mistranslation/7711787
 create mode 100644 classification_output/01/mistranslation/80604314
 create mode 100644 classification_output/01/mistranslation/80615920
 delete mode 100644 classification_output/01/mistranslation/8720260
 delete mode 100644 classification_output/01/mistranslation/8874178

(limited to 'classification_output/01/mistranslation')

diff --git a/classification_output/01/mistranslation/0247400 b/classification_output/01/mistranslation/0247400
deleted file mode 100644
index 746a624c..00000000
--- a/classification_output/01/mistranslation/0247400
+++ /dev/null
@@ -1,1486 +0,0 @@
-mistranslation: 0.659
-instruction: 0.624
-semantic: 0.600
-other: 0.598
-
-[Qemu-devel][bug] qemu crash when migrate vm and vm's disks
-
-When migrate vm and vmâs disks target host qemu crash due to an invalid free.
-#0  object_unref (obj=0x1000) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
-#1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
-#2  flatview_destroy (view=0x560439653880) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
-#3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
-#4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
-#5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
-test base qemu-2.12.0
-ï¼
-but use lastest qemu(v6.0.0-rc2) also reproduce.
-As follow patch can resolve this problem:
-https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
-Steps to reproduce:
-(1) Create VM (virsh define)
-(2) Add 64 virtio scsi disks
-(3) migrate vm and vmâdisks
--------------------------------------------------------------------------------------------------------------------------------------
-æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸­ååº
-çä¸ªäººæç¾¤ç»ãç¦æ­¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
-ææ£åï¼æ¬é®ä»¶ä¸­çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
-é®ä»¶ï¼
-This e-mail and its attachments contain confidential information from New H3C, which is
-intended only for the person or entity whose address is listed above. Any use of the
-information contained herein in any way (including, but not limited to, total or partial
-disclosure, reproduction, or dissemination) by persons other than the intended
-recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
-by phone or email immediately and delete it!
-
-* Yuchen (yu.chen@h3c.com) wrote:
->
-When migrate vm and vmâs disks target host qemu crash due to an invalid free.
->
->
-#0  object_unref (obj=0x1000) at
->
-/qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
->
-#1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
->
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
->
-#2  flatview_destroy (view=0x560439653880) at
->
-/qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
->
-#3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
->
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
->
-#4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
->
-#5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
->
->
-test base qemu-2.12.0ï¼but use lastest qemu(v6.0.0-rc2) also reproduce.
-Interesting.
-
->
-As follow patch can resolve this problem:
->
-https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
-That's a pci/rcu change; ccing Paolo and Micahel.
-
->
-Steps to reproduce:
->
-(1) Create VM (virsh define)
->
-(2) Add 64 virtio scsi disks
-Is that hot adding the disks later, or are they included in the VM at
-creation?
-Can you provide a libvirt XML example?
-
->
-(3) migrate vm and vmâdisks
-What do you mean by 'and vm disks' - are you doing a block migration?
-
-Dave
-
->
--------------------------------------------------------------------------------------------------------------------------------------
->
-æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸­ååº
->
-çä¸ªäººæç¾¤ç»ãç¦æ­¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
->
-ææ£åï¼æ¬é®ä»¶ä¸­çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
->
-é®ä»¶ï¼
->
-This e-mail and its attachments contain confidential information from New
->
-H3C, which is
->
-intended only for the person or entity whose address is listed above. Any use
->
-of the
->
-information contained herein in any way (including, but not limited to, total
->
-or partial
->
-disclosure, reproduction, or dissemination) by persons other than the intended
->
-recipient(s) is prohibited. If you receive this e-mail in error, please
->
-notify the sender
->
-by phone or email immediately and delete it!
--- 
-Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-
->
------é®ä»¶åä»¶-----
->
-åä»¶äºº: Dr. David Alan Gilbert [
-mailto:dgilbert@redhat.com
-]
->
-åéæ¶é´: 2021å¹´4æ8æ¥ 19:27
->
-æ¶ä»¶äºº: yuchen (Cloud) <yu.chen@h3c.com>; pbonzini@redhat.com;
->
-mst@redhat.com
->
-æé: qemu-devel@nongnu.org
->
-ä¸»é¢: Re: [Qemu-devel][bug] qemu crash when migrate vm and vm's disks
->
->
-* Yuchen (yu.chen@h3c.com) wrote:
->
-> When migrate vm and vmâs disks target host qemu crash due to an invalid
->
-free.
->
->
->
-> #0  object_unref (obj=0x1000) at
->
-> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
->
-> #1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
->
->     at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
->
-> #2  flatview_destroy (view=0x560439653880) at
->
-> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
->
-> #3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
->
->     at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
->
-> #4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
->
-> #5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
->
->
->
-> test base qemu-2.12.0ï¼but use lastest qemu(v6.0.0-rc2) also reproduce.
->
->
-Interesting.
->
->
-> As follow patch can resolve this problem:
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
->
->
-That's a pci/rcu change; ccing Paolo and Micahel.
->
->
-> Steps to reproduce:
->
-> (1) Create VM (virsh define)
->
-> (2) Add 64 virtio scsi disks
->
->
-Is that hot adding the disks later, or are they included in the VM at
->
-creation?
->
-Can you provide a libvirt XML example?
->
-Include disks in the VM at creation
-
-vm disks xml (only virtio scsi disks):
-  <devices>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native'/>
-      <source file='/vms/tempp/vm-os'/>
-      <target dev='vda' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data1'/>
-      <target dev='sda' bus='scsi'/>
-      <address type='drive' controller='2' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data2'/>
-      <target dev='sdb' bus='scsi'/>
-      <address type='drive' controller='3' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data3'/>
-      <target dev='sdc' bus='scsi'/>
-      <address type='drive' controller='4' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data4'/>
-      <target dev='sdd' bus='scsi'/>
-      <address type='drive' controller='5' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data5'/>
-      <target dev='sde' bus='scsi'/>
-      <address type='drive' controller='6' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data6'/>
-      <target dev='sdf' bus='scsi'/>
-      <address type='drive' controller='7' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data7'/>
-      <target dev='sdg' bus='scsi'/>
-      <address type='drive' controller='8' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data8'/>
-      <target dev='sdh' bus='scsi'/>
-      <address type='drive' controller='9' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data9'/>
-      <target dev='sdi' bus='scsi'/>
-      <address type='drive' controller='10' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data10'/>
-      <target dev='sdj' bus='scsi'/>
-      <address type='drive' controller='11' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data11'/>
-      <target dev='sdk' bus='scsi'/>
-      <address type='drive' controller='12' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data12'/>
-      <target dev='sdl' bus='scsi'/>
-      <address type='drive' controller='13' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data13'/>
-      <target dev='sdm' bus='scsi'/>
-      <address type='drive' controller='14' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data14'/>
-      <target dev='sdn' bus='scsi'/>
-      <address type='drive' controller='15' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data15'/>
-      <target dev='sdo' bus='scsi'/>
-      <address type='drive' controller='16' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data16'/>
-      <target dev='sdp' bus='scsi'/>
-      <address type='drive' controller='17' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data17'/>
-      <target dev='sdq' bus='scsi'/>
-      <address type='drive' controller='18' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data18'/>
-      <target dev='sdr' bus='scsi'/>
-      <address type='drive' controller='19' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data19'/>
-      <target dev='sds' bus='scsi'/>
-      <address type='drive' controller='20' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data20'/>
-      <target dev='sdt' bus='scsi'/>
-      <address type='drive' controller='21' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data21'/>
-      <target dev='sdu' bus='scsi'/>
-      <address type='drive' controller='22' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data22'/>
-      <target dev='sdv' bus='scsi'/>
-      <address type='drive' controller='23' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data23'/>
-      <target dev='sdw' bus='scsi'/>
-      <address type='drive' controller='24' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data24'/>
-      <target dev='sdx' bus='scsi'/>
-      <address type='drive' controller='25' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data25'/>
-      <target dev='sdy' bus='scsi'/>
-      <address type='drive' controller='26' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data26'/>
-      <target dev='sdz' bus='scsi'/>
-      <address type='drive' controller='27' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data27'/>
-      <target dev='sdaa' bus='scsi'/>
-      <address type='drive' controller='28' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data28'/>
-      <target dev='sdab' bus='scsi'/>
-      <address type='drive' controller='29' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data29'/>
-      <target dev='sdac' bus='scsi'/>
-      <address type='drive' controller='30' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data30'/>
-      <target dev='sdad' bus='scsi'/>
-      <address type='drive' controller='31' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data31'/>
-      <target dev='sdae' bus='scsi'/>
-      <address type='drive' controller='32' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data32'/>
-      <target dev='sdaf' bus='scsi'/>
-      <address type='drive' controller='33' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data33'/>
-      <target dev='sdag' bus='scsi'/>
-      <address type='drive' controller='34' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data34'/>
-      <target dev='sdah' bus='scsi'/>
-      <address type='drive' controller='35' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data35'/>
-      <target dev='sdai' bus='scsi'/>
-      <address type='drive' controller='36' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data36'/>
-      <target dev='sdaj' bus='scsi'/>
-      <address type='drive' controller='37' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data37'/>
-      <target dev='sdak' bus='scsi'/>
-      <address type='drive' controller='38' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data38'/>
-      <target dev='sdal' bus='scsi'/>
-      <address type='drive' controller='39' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data39'/>
-      <target dev='sdam' bus='scsi'/>
-      <address type='drive' controller='40' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data40'/>
-      <target dev='sdan' bus='scsi'/>
-      <address type='drive' controller='41' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data41'/>
-      <target dev='sdao' bus='scsi'/>
-      <address type='drive' controller='42' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data42'/>
-      <target dev='sdap' bus='scsi'/>
-      <address type='drive' controller='43' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data43'/>
-      <target dev='sdaq' bus='scsi'/>
-      <address type='drive' controller='44' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data44'/>
-      <target dev='sdar' bus='scsi'/>
-      <address type='drive' controller='45' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data45'/>
-      <target dev='sdas' bus='scsi'/>
-      <address type='drive' controller='46' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data46'/>
-      <target dev='sdat' bus='scsi'/>
-      <address type='drive' controller='47' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data47'/>
-      <target dev='sdau' bus='scsi'/>
-      <address type='drive' controller='48' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data48'/>
-      <target dev='sdav' bus='scsi'/>
-      <address type='drive' controller='49' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data49'/>
-      <target dev='sdaw' bus='scsi'/>
-      <address type='drive' controller='50' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data50'/>
-      <target dev='sdax' bus='scsi'/>
-      <address type='drive' controller='51' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data51'/>
-      <target dev='sday' bus='scsi'/>
-      <address type='drive' controller='52' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data52'/>
-      <target dev='sdaz' bus='scsi'/>
-      <address type='drive' controller='53' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data53'/>
-      <target dev='sdba' bus='scsi'/>
-      <address type='drive' controller='54' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data54'/>
-      <target dev='sdbb' bus='scsi'/>
-      <address type='drive' controller='55' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data55'/>
-      <target dev='sdbc' bus='scsi'/>
-      <address type='drive' controller='56' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data56'/>
-      <target dev='sdbd' bus='scsi'/>
-      <address type='drive' controller='57' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data57'/>
-      <target dev='sdbe' bus='scsi'/>
-      <address type='drive' controller='58' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data58'/>
-      <target dev='sdbf' bus='scsi'/>
-      <address type='drive' controller='59' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data59'/>
-      <target dev='sdbg' bus='scsi'/>
-      <address type='drive' controller='60' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data60'/>
-      <target dev='sdbh' bus='scsi'/>
-      <address type='drive' controller='61' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data61'/>
-      <target dev='sdbi' bus='scsi'/>
-      <address type='drive' controller='62' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data62'/>
-      <target dev='sdbj' bus='scsi'/>
-      <address type='drive' controller='63' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data63'/>
-      <target dev='sdbk' bus='scsi'/>
-      <address type='drive' controller='64' bus='0' target='0' unit='0'/>
-    </disk>
-    <controller type='scsi' index='0'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x02' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='1' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='2' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='3' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='4' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x04' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='5' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='6' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x06' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='7' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='8' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x08' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='9' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x09' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='10' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='11' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='12' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='13' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='14' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='15' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='16' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x10' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='17' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x11' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='18' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x12' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='19' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x13' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='20' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x14' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='21' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x15' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='22' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x16' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='23' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x17' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='24' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x18' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='25' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x19' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='26' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='27' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='28' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='29' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='30' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='31' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='32' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='33' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='34' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='35' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='36' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='37' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='38' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='39' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x09' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='40' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='41' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='42' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='43' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='44' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='45' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='46' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='47' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='48' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='49' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='50' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='51' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x10' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='52' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x11' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='53' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x12' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='54' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x13' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='55' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x14' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='56' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x15' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='57' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x16' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='58' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x17' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='59' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='60' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x19' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='61' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='62' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='63' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='64' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' 
-function='0x0'/>
-    </controller>
-    <controller type='pci' index='0' model='pci-root'/>
-    <controller type='pci' index='1' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='1'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' 
-function='0x0'/>
-    </controller>
-    <controller type='pci' index='2' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='2'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' 
-function='0x0'/>
-    </controller>
-  </devices>
-
-vm disks xml (only virtio disks):
-  <devices>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native'/>
-      <source file='/vms/tempp/vm-os'/>
-      <target dev='vda' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data2'/>
-      <target dev='vdb' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data3'/>
-      <target dev='vdc' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data4'/>
-      <target dev='vdd' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data5'/>
-      <target dev='vde' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data6'/>
-      <target dev='vdf' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data7'/>
-      <target dev='vdg' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data8'/>
-      <target dev='vdh' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data9'/>
-      <target dev='vdi' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x10' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data10'/>
-      <target dev='vdj' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x11' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data11'/>
-      <target dev='vdk' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x12' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data12'/>
-      <target dev='vdl' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x13' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data13'/>
-      <target dev='vdm' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x14' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data14'/>
-      <target dev='vdn' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x15' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data15'/>
-      <target dev='vdo' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x16' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data16'/>
-      <target dev='vdp' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x17' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data17'/>
-      <target dev='vdq' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data18'/>
-      <target dev='vdr' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x19' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data19'/>
-      <target dev='vds' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data20'/>
-      <target dev='vdt' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data21'/>
-      <target dev='vdu' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data22'/>
-      <target dev='vdv' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data23'/>
-      <target dev='vdw' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data24'/>
-      <target dev='vdx' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data25'/>
-      <target dev='vdy' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data26'/>
-      <target dev='vdz' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x04' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data27'/>
-      <target dev='vdaa' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data28'/>
-      <target dev='vdab' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x06' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data29'/>
-      <target dev='vdac' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data30'/>
-      <target dev='vdad' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data31'/>
-      <target dev='vdae' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x09' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data32'/>
-      <target dev='vdaf' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data33'/>
-      <target dev='vdag' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data34'/>
-      <target dev='vdah' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data35'/>
-      <target dev='vdai' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data36'/>
-      <target dev='vdaj' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data37'/>
-      <target dev='vdak' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data38'/>
-      <target dev='vdal' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x10' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data39'/>
-      <target dev='vdam' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x11' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data40'/>
-      <target dev='vdan' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x12' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data41'/>
-      <target dev='vdao' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x13' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data42'/>
-      <target dev='vdap' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x14' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data43'/>
-      <target dev='vdaq' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x15' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data44'/>
-      <target dev='vdar' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x16' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data45'/>
-      <target dev='vdas' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x17' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data46'/>
-      <target dev='vdat' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x18' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data47'/>
-      <target dev='vdau' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x19' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data48'/>
-      <target dev='vdav' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data49'/>
-      <target dev='vdaw' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data50'/>
-      <target dev='vdax' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data51'/>
-      <target dev='vday' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data52'/>
-      <target dev='vdaz' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data53'/>
-      <target dev='vdba' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data54'/>
-      <target dev='vdbb' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data55'/>
-      <target dev='vdbc' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data56'/>
-      <target dev='vdbd' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data57'/>
-      <target dev='vdbe' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data58'/>
-      <target dev='vdbf' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data59'/>
-      <target dev='vdbg' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data60'/>
-      <target dev='vdbh' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data61'/>
-      <target dev='vdbi' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x09' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data62'/>
-      <target dev='vdbj' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data63'/>
-      <target dev='vdbk' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data1'/>
-      <target dev='vdbl' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
-function='0x0'/>
-    </disk>
-    <controller type='pci' index='0' model='pci-root'/>
-    <controller type='pci' index='1' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='1'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' 
-function='0x0'/>
-    </controller>
-    <controller type='pci' index='2' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='2'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' 
-function='0x0'/>
-    </controller>
-  </devices>
-
->
-> (3) migrate vm and vmâdisks
->
->
-What do you mean by 'and vm disks' - are you doing a block migration?
->
-Yes, block migration.
-In fact, only migration domain also reproduced.
-
->
-Dave
->
->
-> ----------------------------------------------------------------------
->
-> ---------------------------------------------------------------
->
-Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--------------------------------------------------------------------------------------------------------------------------------------
-æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸­ååº
-çä¸ªäººæç¾¤ç»ãç¦æ­¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
-ææ£åï¼æ¬é®ä»¶ä¸­çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
-é®ä»¶ï¼
-This e-mail and its attachments contain confidential information from New H3C, 
-which is
-intended only for the person or entity whose address is listed above. Any use 
-of the
-information contained herein in any way (including, but not limited to, total 
-or partial
-disclosure, reproduction, or dissemination) by persons other than the intended
-recipient(s) is prohibited. If you receive this e-mail in error, please notify 
-the sender
-by phone or email immediately and delete it!
-
diff --git a/classification_output/01/mistranslation/1267916 b/classification_output/01/mistranslation/1267916
deleted file mode 100644
index fffafcf7..00000000
--- a/classification_output/01/mistranslation/1267916
+++ /dev/null
@@ -1,1878 +0,0 @@
-mistranslation: 0.927
-instruction: 0.903
-semantic: 0.891
-other: 0.877
-
-[Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
-
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-And now we found the same issue at the tcg vm(kvm is fine), after
-migration, the content VM's memory is inconsistent.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
- obj-y += memory_mapping.o
- obj-y += dump.o
- obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
- # xen support
- obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-}
-
-     rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
-             "%" PRIu64 "\n", ret, seq_iter);
-     return ret;
- }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
- }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
- void qemu_savevm_state_begin(QEMUFile *f,
-                              const MigrationParams *params)
- {
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile
-*f, bool iterable_only)
-save_section_header(f, se, QEMU_VM_SECTION_END);
-
-         ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-         trace_savevm_section_end(se->idstr, se->section_id, ret);
-         save_section_footer(f, se);
-         if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-section_id, le->se->idstr);
-                 return ret;
-             }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-             if (!check_section_footer(f, le)) {
-                 return -EINVAL;
-             }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-     }
-
-     cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
-+    check_host_md5();
-
-     return ret;
- }
-
-* Li Zhijian (address@hidden) wrote:
->
-Hi all,
->
->
-Does anyboday remember the similar issue post by hailiang months ago
->
-http://patchwork.ozlabs.org/patch/454322/
->
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-
->
-And now we found the same issue at the tcg vm(kvm is fine), after migration,
->
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-
->
-we add a patch to check memory content, you can find it from affix
->
->
-steps to reporduce:
->
-1) apply the patch and re-build qemu
->
-2) prepare the ubuntu guest and run memtest in grub.
->
-soruce side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off
->
->
-destination side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
->
->
-3) start migration
->
-with 1000M NIC, migration will finish within 3 min.
->
->
-at source:
->
-(qemu) migrate tcp:192.168.2.66:8881
->
-after saving ram complete
->
-e9e725df678d392b1a83b3a917f332bb
->
-qemu-system-x86_64: end ram md5
->
-(qemu)
->
->
-at destination:
->
-...skip...
->
-Completed load of VM with exit code 0 seq iteration 1264
->
-Completed load of VM with exit code 0 seq iteration 1265
->
-Completed load of VM with exit code 0 seq iteration 1266
->
-qemu-system-x86_64: after loading state section id 2(ram)
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
->
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
->
-This occurs occasionally and only at tcg machine. It seems that
->
-some pages dirtied in source side don't transferred to destination.
->
-This problem can be reproduced even if we disable virtio.
->
->
-Is it OK for some pages that not transferred to destination when do
->
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-
-Dave
-
->
-Any idea...
->
->
-=================md5 check patch=============================
->
->
-diff --git a/Makefile.target b/Makefile.target
->
-index 962d004..e2cb8e9 100644
->
---- a/Makefile.target
->
-+++ b/Makefile.target
->
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
->
-obj-y += memory_mapping.o
->
-obj-y += dump.o
->
-obj-y += migration/ram.o migration/savevm.o
->
--LIBS := $(libs_softmmu) $(LIBS)
->
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
->
->
-# xen support
->
-obj-$(CONFIG_XEN) += xen-common.o
->
-diff --git a/migration/ram.c b/migration/ram.c
->
-index 1eb155a..3b7a09d 100644
->
---- a/migration/ram.c
->
-+++ b/migration/ram.c
->
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
->
-version_id)
->
-}
->
->
-rcu_read_unlock();
->
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
->
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
->
-"%" PRIu64 "\n", ret, seq_iter);
->
-return ret;
->
-}
->
-diff --git a/migration/savevm.c b/migration/savevm.c
->
-index 0ad1b93..3feaa61 100644
->
---- a/migration/savevm.c
->
-+++ b/migration/savevm.c
->
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
->
->
-}
->
->
-+#include "exec/ram_addr.h"
->
-+#include "qemu/rcu_queue.h"
->
-+#include <clplumbing/md5.h>
->
-+#ifndef MD5_DIGEST_LENGTH
->
-+#define MD5_DIGEST_LENGTH 16
->
-+#endif
->
-+
->
-+static void check_host_md5(void)
->
-+{
->
-+    int i;
->
-+    unsigned char md[MD5_DIGEST_LENGTH];
->
-+    rcu_read_lock();
->
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
->
-'pc.ram' block */
->
-+    rcu_read_unlock();
->
-+
->
-+    MD5(block->host, block->used_length, md);
->
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
->
-+        fprintf(stderr, "%02x", md[i]);
->
-+    }
->
-+    fprintf(stderr, "\n");
->
-+    error_report("end ram md5");
->
-+}
->
-+
->
-void qemu_savevm_state_begin(QEMUFile *f,
->
-const MigrationParams *params)
->
-{
->
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
->
-bool iterable_only)
->
-save_section_header(f, se, QEMU_VM_SECTION_END);
->
->
-ret = se->ops->save_live_complete_precopy(f, se->opaque);
->
-+
->
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
->
-+        check_host_md5();
->
-+
->
-trace_savevm_section_end(se->idstr, se->section_id, ret);
->
-save_section_footer(f, se);
->
-if (ret < 0) {
->
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
->
-MigrationIncomingState *mis)
->
-section_id, le->se->idstr);
->
-return ret;
->
-}
->
-+            if (section_type == QEMU_VM_SECTION_END) {
->
-+                error_report("after loading state section id %d(%s)",
->
-+                             section_id, le->se->idstr);
->
-+                check_host_md5();
->
-+            }
->
-if (!check_section_footer(f, le)) {
->
-return -EINVAL;
->
-}
->
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
->
-}
->
->
-cpu_synchronize_all_post_init();
->
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
->
-+    check_host_md5();
->
->
-return ret;
->
-}
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
-* Li Zhijian (address@hidden) wrote:
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-And now we found the same issue at the tcg vm(kvm is fine), after migration,
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-Maybe one better way to do that is with the help of userfaultfd's write-protect
-capability. It is still in the development by Andrea Arcangeli, but there
-is a RFC version available, please refer to
-http://www.spinics.net/lists/linux-mm/msg97422.html
-ï¼I'm developing live memory snapshot which based on it, maybe this is another 
-scene where we
-can use userfaultfd's WP ;) ).
-Dave
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
-  obj-y += memory_mapping.o
-  obj-y += dump.o
-  obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
-  # xen support
-  obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-      }
-
-      rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
-              "%" PRIu64 "\n", ret, seq_iter);
-      return ret;
-  }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
-  }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
-  void qemu_savevm_state_begin(QEMUFile *f,
-                               const MigrationParams *params)
-  {
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
-bool iterable_only)
-          save_section_header(f, se, QEMU_VM_SECTION_END);
-
-          ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-          trace_savevm_section_end(se->idstr, se->section_id, ret);
-          save_section_footer(f, se);
-          if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-                               section_id, le->se->idstr);
-                  return ret;
-              }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-              if (!check_section_footer(f, le)) {
-                  return -EINVAL;
-              }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-      }
-
-      cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
-+    check_host_md5();
-
-      return ret;
-  }
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-.
-
-On 12/03/2015 05:37 PM, Hailiang Zhang wrote:
-On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
-* Li Zhijian (address@hidden) wrote:
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-And now we found the same issue at the tcg vm(kvm is fine), after
-migration,
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after
-cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-Maybe one better way to do that is with the help of userfaultfd's
-write-protect
-capability. It is still in the development by Andrea Arcangeli, but there
-is a RFC version available, please refer to
-http://www.spinics.net/lists/linux-mm/msg97422.html
-ï¼I'm developing live memory snapshot which based on it, maybe this is
-another scene where we
-can use userfaultfd's WP ;) ).
-sounds good.
-
-thanks
-Li
-Dave
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
-  obj-y += memory_mapping.o
-  obj-y += dump.o
-  obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
-  # xen support
-  obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-      }
-
-      rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq
-iteration "
-              "%" PRIu64 "\n", ret, seq_iter);
-      return ret;
-  }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
-  }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
-  void qemu_savevm_state_begin(QEMUFile *f,
-                               const MigrationParams *params)
-  {
-@@ -1056,6 +1079,10 @@ void
-qemu_savevm_state_complete_precopy(QEMUFile *f,
-bool iterable_only)
-          save_section_header(f, se, QEMU_VM_SECTION_END);
-
-          ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-          trace_savevm_section_end(se->idstr, se->section_id, ret);
-          save_section_footer(f, se);
-          if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-                               section_id, le->se->idstr);
-                  return ret;
-              }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-              if (!check_section_footer(f, le)) {
-                  return -EINVAL;
-              }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-      }
-
-      cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n",
-__func__);
-+    check_host_md5();
-
-      return ret;
-  }
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-.
-.
---
-Best regards.
-Li Zhijian (8555)
-
-On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
-* Li Zhijian (address@hidden) wrote:
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-And now we found the same issue at the tcg vm(kvm is fine), after migration,
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-I try to dump and compare them, more than 10 pages are different.
-in source side, they are random value rather than always 'FF' 'FB' 'EF'
-'BF'... in destination.
-and not all of the different pages are continuous.
-
-thanks
-Li
-Dave
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
-  obj-y += memory_mapping.o
-  obj-y += dump.o
-  obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
-  # xen support
-  obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-      }
-
-      rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
-              "%" PRIu64 "\n", ret, seq_iter);
-      return ret;
-  }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
-  }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
-  void qemu_savevm_state_begin(QEMUFile *f,
-                               const MigrationParams *params)
-  {
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
-bool iterable_only)
-          save_section_header(f, se, QEMU_VM_SECTION_END);
-
-          ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-          trace_savevm_section_end(se->idstr, se->section_id, ret);
-          save_section_footer(f, se);
-          if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-                               section_id, le->se->idstr);
-                  return ret;
-              }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-              if (!check_section_footer(f, le)) {
-                  return -EINVAL;
-              }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-      }
-
-      cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
-+    check_host_md5();
-
-      return ret;
-  }
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-
-.
---
-Best regards.
-Li Zhijian (8555)
-
-* Li Zhijian (address@hidden) wrote:
->
->
->
-On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
->
->* Li Zhijian (address@hidden) wrote:
->
->>Hi all,
->
->>
->
->>Does anyboday remember the similar issue post by hailiang months ago
->
->>
-http://patchwork.ozlabs.org/patch/454322/
->
->>At least tow bugs about migration had been fixed since that.
->
->
->
->Yes, I wondered what happened to that.
->
->
->
->>And now we found the same issue at the tcg vm(kvm is fine), after migration,
->
->>the content VM's memory is inconsistent.
->
->
->
->Hmm, TCG only - I don't know much about that; but I guess something must
->
->be accessing memory without using the proper macros/functions so
->
->it doesn't mark it as dirty.
->
->
->
->>we add a patch to check memory content, you can find it from affix
->
->>
->
->>steps to reporduce:
->
->>1) apply the patch and re-build qemu
->
->>2) prepare the ubuntu guest and run memtest in grub.
->
->>soruce side:
->
->>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
->>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
->>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
->>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
->>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
->>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
->>pc-i440fx-2.3,accel=tcg,usb=off
->
->>
->
->>destination side:
->
->>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
->>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
->>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
->>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
->>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
->>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
->>pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
->
->>
->
->>3) start migration
->
->>with 1000M NIC, migration will finish within 3 min.
->
->>
->
->>at source:
->
->>(qemu) migrate tcp:192.168.2.66:8881
->
->>after saving ram complete
->
->>e9e725df678d392b1a83b3a917f332bb
->
->>qemu-system-x86_64: end ram md5
->
->>(qemu)
->
->>
->
->>at destination:
->
->>...skip...
->
->>Completed load of VM with exit code 0 seq iteration 1264
->
->>Completed load of VM with exit code 0 seq iteration 1265
->
->>Completed load of VM with exit code 0 seq iteration 1266
->
->>qemu-system-x86_64: after loading state section id 2(ram)
->
->>49c2dac7bde0e5e22db7280dcb3824f9
->
->>qemu-system-x86_64: end ram md5
->
->>qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
->
->>
->
->>49c2dac7bde0e5e22db7280dcb3824f9
->
->>qemu-system-x86_64: end ram md5
->
->>
->
->>This occurs occasionally and only at tcg machine. It seems that
->
->>some pages dirtied in source side don't transferred to destination.
->
->>This problem can be reproduced even if we disable virtio.
->
->>
->
->>Is it OK for some pages that not transferred to destination when do
->
->>migration ? Or is it a bug?
->
->
->
->I'm pretty sure that means it's a bug.  Hard to find though, I guess
->
->at least memtest is smaller than a big OS.  I think I'd dump the whole
->
->of memory on both sides, hexdump and diff them  - I'd guess it would
->
->just be one byte/word different, maybe that would offer some idea what
->
->wrote it.
->
->
-I try to dump and compare them, more than 10 pages are different.
->
-in source side, they are random value rather than always 'FF' 'FB' 'EF'
->
-'BF'... in destination.
->
->
-and not all of the different pages are continuous.
-I wonder if it happens on all of memtest's different test patterns,
-perhaps it might be possible to narrow it down if you tell memtest
-to only run one test at a time.
-
-Dave
-
->
->
-thanks
->
-Li
->
->
->
->
->
->Dave
->
->
->
->>Any idea...
->
->>
->
->>=================md5 check patch=============================
->
->>
->
->>diff --git a/Makefile.target b/Makefile.target
->
->>index 962d004..e2cb8e9 100644
->
->>--- a/Makefile.target
->
->>+++ b/Makefile.target
->
->>@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
->
->>  obj-y += memory_mapping.o
->
->>  obj-y += dump.o
->
->>  obj-y += migration/ram.o migration/savevm.o
->
->>-LIBS := $(libs_softmmu) $(LIBS)
->
->>+LIBS := $(libs_softmmu) $(LIBS) -lplumb
->
->>
->
->>  # xen support
->
->>  obj-$(CONFIG_XEN) += xen-common.o
->
->>diff --git a/migration/ram.c b/migration/ram.c
->
->>index 1eb155a..3b7a09d 100644
->
->>--- a/migration/ram.c
->
->>+++ b/migration/ram.c
->
->>@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
->
->>version_id)
->
->>      }
->
->>
->
->>      rcu_read_unlock();
->
->>-    DPRINTF("Completed load of VM with exit code %d seq iteration "
->
->>+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
->
->>              "%" PRIu64 "\n", ret, seq_iter);
->
->>      return ret;
->
->>  }
->
->>diff --git a/migration/savevm.c b/migration/savevm.c
->
->>index 0ad1b93..3feaa61 100644
->
->>--- a/migration/savevm.c
->
->>+++ b/migration/savevm.c
->
->>@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
->
->>
->
->>  }
->
->>
->
->>+#include "exec/ram_addr.h"
->
->>+#include "qemu/rcu_queue.h"
->
->>+#include <clplumbing/md5.h>
->
->>+#ifndef MD5_DIGEST_LENGTH
->
->>+#define MD5_DIGEST_LENGTH 16
->
->>+#endif
->
->>+
->
->>+static void check_host_md5(void)
->
->>+{
->
->>+    int i;
->
->>+    unsigned char md[MD5_DIGEST_LENGTH];
->
->>+    rcu_read_lock();
->
->>+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
->
->>'pc.ram' block */
->
->>+    rcu_read_unlock();
->
->>+
->
->>+    MD5(block->host, block->used_length, md);
->
->>+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
->
->>+        fprintf(stderr, "%02x", md[i]);
->
->>+    }
->
->>+    fprintf(stderr, "\n");
->
->>+    error_report("end ram md5");
->
->>+}
->
->>+
->
->>  void qemu_savevm_state_begin(QEMUFile *f,
->
->>                               const MigrationParams *params)
->
->>  {
->
->>@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
->
->>bool iterable_only)
->
->>          save_section_header(f, se, QEMU_VM_SECTION_END);
->
->>
->
->>          ret = se->ops->save_live_complete_precopy(f, se->opaque);
->
->>+
->
->>+        fprintf(stderr, "after saving %s complete\n", se->idstr);
->
->>+        check_host_md5();
->
->>+
->
->>          trace_savevm_section_end(se->idstr, se->section_id, ret);
->
->>          save_section_footer(f, se);
->
->>          if (ret < 0) {
->
->>@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
->
->>MigrationIncomingState *mis)
->
->>                               section_id, le->se->idstr);
->
->>                  return ret;
->
->>              }
->
->>+            if (section_type == QEMU_VM_SECTION_END) {
->
->>+                error_report("after loading state section id %d(%s)",
->
->>+                             section_id, le->se->idstr);
->
->>+                check_host_md5();
->
->>+            }
->
->>              if (!check_section_footer(f, le)) {
->
->>                  return -EINVAL;
->
->>              }
->
->>@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
->
->>      }
->
->>
->
->>      cpu_synchronize_all_post_init();
->
->>+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
->
->>+    check_host_md5();
->
->>
->
->>      return ret;
->
->>  }
->
->>
->
->>
->
->>
->
->--
->
->Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
->
->
->
->.
->
->
->
->
---
->
-Best regards.
->
-Li Zhijian (8555)
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-Li Zhijian <address@hidden> wrote:
->
-Hi all,
->
->
-Does anyboday remember the similar issue post by hailiang months ago
->
-http://patchwork.ozlabs.org/patch/454322/
->
-At least tow bugs about migration had been fixed since that.
->
->
-And now we found the same issue at the tcg vm(kvm is fine), after
->
-migration, the content VM's memory is inconsistent.
->
->
-we add a patch to check memory content, you can find it from affix
->
->
-steps to reporduce:
->
-1) apply the patch and re-build qemu
->
-2) prepare the ubuntu guest and run memtest in grub.
->
-soruce side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off
->
->
-destination side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
->
->
-3) start migration
->
-with 1000M NIC, migration will finish within 3 min.
->
->
-at source:
->
-(qemu) migrate tcp:192.168.2.66:8881
->
-after saving ram complete
->
-e9e725df678d392b1a83b3a917f332bb
->
-qemu-system-x86_64: end ram md5
->
-(qemu)
->
->
-at destination:
->
-...skip...
->
-Completed load of VM with exit code 0 seq iteration 1264
->
-Completed load of VM with exit code 0 seq iteration 1265
->
-Completed load of VM with exit code 0 seq iteration 1266
->
-qemu-system-x86_64: after loading state section id 2(ram)
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
->
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
->
-This occurs occasionally and only at tcg machine. It seems that
->
-some pages dirtied in source side don't transferred to destination.
->
-This problem can be reproduced even if we disable virtio.
->
->
-Is it OK for some pages that not transferred to destination when do
->
-migration ? Or is it a bug?
->
->
-Any idea...
-Thanks for describing how to reproduce the bug.
-If some pages are not transferred to destination then it is a bug, so we
-need to know what the problem is, notice that the problem can be that
-TCG is not marking dirty some page, that Migration code "forgets" about
-that page, or anything eles altogether, that is what we need to find.
-
-There are more posibilities, I am not sure that memtest is on 32bit
-mode, and it is inside posibility that we are missing some state when we
-are on real mode.
-
-Will try to take a look at this.
-
-THanks, again.
-
-
->
->
-=================md5 check patch=============================
->
->
-diff --git a/Makefile.target b/Makefile.target
->
-index 962d004..e2cb8e9 100644
->
---- a/Makefile.target
->
-+++ b/Makefile.target
->
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
->
-obj-y += memory_mapping.o
->
-obj-y += dump.o
->
-obj-y += migration/ram.o migration/savevm.o
->
--LIBS := $(libs_softmmu) $(LIBS)
->
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
->
->
-# xen support
->
-obj-$(CONFIG_XEN) += xen-common.o
->
-diff --git a/migration/ram.c b/migration/ram.c
->
-index 1eb155a..3b7a09d 100644
->
---- a/migration/ram.c
->
-+++ b/migration/ram.c
->
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque,
->
-int version_id)
->
-}
->
->
-rcu_read_unlock();
->
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
->
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
->
-"%" PRIu64 "\n", ret, seq_iter);
->
-return ret;
->
-}
->
-diff --git a/migration/savevm.c b/migration/savevm.c
->
-index 0ad1b93..3feaa61 100644
->
---- a/migration/savevm.c
->
-+++ b/migration/savevm.c
->
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
->
->
-}
->
->
-+#include "exec/ram_addr.h"
->
-+#include "qemu/rcu_queue.h"
->
-+#include <clplumbing/md5.h>
->
-+#ifndef MD5_DIGEST_LENGTH
->
-+#define MD5_DIGEST_LENGTH 16
->
-+#endif
->
-+
->
-+static void check_host_md5(void)
->
-+{
->
-+    int i;
->
-+    unsigned char md[MD5_DIGEST_LENGTH];
->
-+    rcu_read_lock();
->
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
->
-'pc.ram' block */
->
-+    rcu_read_unlock();
->
-+
->
-+    MD5(block->host, block->used_length, md);
->
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
->
-+        fprintf(stderr, "%02x", md[i]);
->
-+    }
->
-+    fprintf(stderr, "\n");
->
-+    error_report("end ram md5");
->
-+}
->
-+
->
-void qemu_savevm_state_begin(QEMUFile *f,
->
-const MigrationParams *params)
->
-{
->
-@@ -1056,6 +1079,10 @@ void
->
-qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
->
-save_section_header(f, se, QEMU_VM_SECTION_END);
->
->
-ret = se->ops->save_live_complete_precopy(f, se->opaque);
->
-+
->
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
->
-+        check_host_md5();
->
-+
->
-trace_savevm_section_end(se->idstr, se->section_id, ret);
->
-save_section_footer(f, se);
->
-if (ret < 0) {
->
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
->
-MigrationIncomingState *mis)
->
-section_id, le->se->idstr);
->
-return ret;
->
-}
->
-+            if (section_type == QEMU_VM_SECTION_END) {
->
-+                error_report("after loading state section id %d(%s)",
->
-+                             section_id, le->se->idstr);
->
-+                check_host_md5();
->
-+            }
->
-if (!check_section_footer(f, le)) {
->
-return -EINVAL;
->
-}
->
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
->
-}
->
->
-cpu_synchronize_all_post_init();
->
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
->
-+    check_host_md5();
->
->
-return ret;
->
-}
-
->
->
-Thanks for describing how to reproduce the bug.
->
-If some pages are not transferred to destination then it is a bug, so we need
->
-to know what the problem is, notice that the problem can be that TCG is not
->
-marking dirty some page, that Migration code "forgets" about that page, or
->
-anything eles altogether, that is what we need to find.
->
->
-There are more posibilities, I am not sure that memtest is on 32bit mode, and
->
-it is inside posibility that we are missing some state when we are on real
->
-mode.
->
->
-Will try to take a look at this.
->
->
-THanks, again.
->
-Hi Juan & Amit
-
- Do you think we should add a mechanism to check the data integrity during LM 
-like Zhijian's patch did?  it may be very helpful for developers. 
- Actually, I did the similar thing before in order to make sure that I did the 
-right thing we I change the code related to LM.
-
-Liang
-
-On (Fri) 04 Dec 2015 [01:43:07], Li, Liang Z wrote:
->
->
->
-> Thanks for describing how to reproduce the bug.
->
-> If some pages are not transferred to destination then it is a bug, so we
->
-> need
->
-> to know what the problem is, notice that the problem can be that TCG is not
->
-> marking dirty some page, that Migration code "forgets" about that page, or
->
-> anything eles altogether, that is what we need to find.
->
->
->
-> There are more posibilities, I am not sure that memtest is on 32bit mode,
->
-> and
->
-> it is inside posibility that we are missing some state when we are on real
->
-> mode.
->
->
->
-> Will try to take a look at this.
->
->
->
-> THanks, again.
->
->
->
->
-Hi Juan & Amit
->
->
-Do you think we should add a mechanism to check the data integrity during LM
->
-like Zhijian's patch did?  it may be very helpful for developers.
->
-Actually, I did the similar thing before in order to make sure that I did
->
-the right thing we I change the code related to LM.
-If you mean for debugging, something that's not always on, then I'm
-fine with it.
-
-A script that goes along that shows the result of comparison of the
-diff will be helpful too, something that shows how many pages are
-differnt, how many bytes in a page on average, and so on.
-
-                Amit
-
diff --git a/classification_output/01/mistranslation/14887122 b/classification_output/01/mistranslation/14887122
new file mode 100644
index 00000000..f13db3b8
--- /dev/null
+++ b/classification_output/01/mistranslation/14887122
@@ -0,0 +1,258 @@
+mistranslation: 0.930
+semantic: 0.928
+instruction: 0.905
+other: 0.890
+
+[BUG][RFC] CPR transfer Issues: Socket permissions and PID files
+
+Hello,
+
+While testing CPR transfer I encountered two issues. The first is that the 
+transfer fails when running with pidfiles due to the destination qemu process 
+attempting to create the pidfile while it is still locked by the source 
+process. The second is that the transfer fails when running with the -run-with 
+user=$USERID parameter. This is because the destination qemu process creates 
+the UNIX sockets used for the CPR transfer before dropping to the lower 
+permissioned user, which causes them to be owned by the original user. The 
+source qemu process then does not have permission to connect to it because it 
+is already running as the lesser permissioned user.
+
+Reproducing the first issue:
+
+Create a source and destination qemu instance associated with the same VM where 
+both processes have the -pidfile parameter passed on the command line. You 
+should see the following error on the command line of the second process:
+
+qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
+temporarily unavailable
+
+Reproducing the second issue:
+
+Create a source and destination qemu instance associated with the same VM where 
+both processes have -run-with user=$USERID passed on the command line, where 
+$USERID is a different user from the one launching the processes. Then attempt 
+a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
+receive the following error via QMP:
+{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
+Permission denied"}}
+
+I provided a minimal patch that works around the second issue.
+
+Thank you,
+Ben Chaney
+
+---
+include/system/os-posix.h | 4 ++++
+os-posix.c | 8 --------
+util/qemu-sockets.c | 21 +++++++++++++++++++++
+3 files changed, 25 insertions(+), 8 deletions(-)
+
+diff --git a/include/system/os-posix.h b/include/system/os-posix.h
+index ce5b3bccf8..2a414a914a 100644
+--- a/include/system/os-posix.h
++++ b/include/system/os-posix.h
+@@ -55,6 +55,10 @@ void os_setup_limits(void);
+void os_setup_post(void);
+int os_mlock(bool on_fault);
+
++extern struct passwd *user_pwd;
++extern uid_t user_uid;
++extern gid_t user_gid;
++
+/**
+* qemu_alloc_stack:
+* @sz: pointer to a size_t holding the requested usable stack size
+diff --git a/os-posix.c b/os-posix.c
+index 52925c23d3..9369b312a0 100644
+--- a/os-posix.c
++++ b/os-posix.c
+@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
+}
+
+
+-/*
+- * Must set all three of these at once.
+- * Legal combinations are unset by name by uid
+- */
+-static struct passwd *user_pwd; /* NULL non-NULL NULL */
+-static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
+-static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
+-
+/*
+* Prepare to change user ID. user_id can be one of 3 forms:
+* - a username, in which case user ID will be changed to its uid,
+diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
+index 77477c1cd5..987977ead9 100644
+--- a/util/qemu-sockets.c
++++ b/util/qemu-sockets.c
+@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
+#endif
+}
+
++/*
++ * Must set all three of these at once.
++ * Legal combinations are unset by name by uid
++ */
++struct passwd *user_pwd; /* NULL non-NULL NULL */
++uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
++gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
++
+static int unix_listen_saddr(UnixSocketAddress *saddr,
+int num,
+Error **errp)
+@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
+error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
+goto err;
+}
++ if (user_pwd) {
++ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
+path);
++ goto err;
++ }
++ }
++ else if (user_uid != -1 && user_gid != -1) {
++ if (chown(un.sun_path, user_uid, user_gid) < 0) {
++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
+path);
++ goto err;
++ }
++ }
++
+if (listen(sock, num) < 0) {
+error_setg_errno(errp, errno, "Failed to listen on socket");
+goto err;
+--
+2.40.1
+
+Thank you Ben.  I appreciate you testing CPR and shaking out the bugs.
+I will study these and propose patches.
+
+My initial reaction to the pidfile issue is that the orchestration layer must
+pass a different filename when starting the destination qemu instance.  When
+using live update without containers, these types of resource conflicts in the
+global namespaces are a known issue.
+
+- Steve
+
+On 3/14/2025 2:33 PM, Chaney, Ben wrote:
+Hello,
+
+While testing CPR transfer I encountered two issues. The first is that the 
+transfer fails when running with pidfiles due to the destination qemu process 
+attempting to create the pidfile while it is still locked by the source 
+process. The second is that the transfer fails when running with the -run-with 
+user=$USERID parameter. This is because the destination qemu process creates 
+the UNIX sockets used for the CPR transfer before dropping to the lower 
+permissioned user, which causes them to be owned by the original user. The 
+source qemu process then does not have permission to connect to it because it 
+is already running as the lesser permissioned user.
+
+Reproducing the first issue:
+
+Create a source and destination qemu instance associated with the same VM where 
+both processes have the -pidfile parameter passed on the command line. You 
+should see the following error on the command line of the second process:
+
+qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
+temporarily unavailable
+
+Reproducing the second issue:
+
+Create a source and destination qemu instance associated with the same VM where 
+both processes have -run-with user=$USERID passed on the command line, where 
+$USERID is a different user from the one launching the processes. Then attempt 
+a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
+receive the following error via QMP:
+{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
+Permission denied"}}
+
+I provided a minimal patch that works around the second issue.
+
+Thank you,
+Ben Chaney
+
+---
+include/system/os-posix.h | 4 ++++
+os-posix.c | 8 --------
+util/qemu-sockets.c | 21 +++++++++++++++++++++
+3 files changed, 25 insertions(+), 8 deletions(-)
+
+diff --git a/include/system/os-posix.h b/include/system/os-posix.h
+index ce5b3bccf8..2a414a914a 100644
+--- a/include/system/os-posix.h
++++ b/include/system/os-posix.h
+@@ -55,6 +55,10 @@ void os_setup_limits(void);
+void os_setup_post(void);
+int os_mlock(bool on_fault);
+
++extern struct passwd *user_pwd;
++extern uid_t user_uid;
++extern gid_t user_gid;
++
+/**
+* qemu_alloc_stack:
+* @sz: pointer to a size_t holding the requested usable stack size
+diff --git a/os-posix.c b/os-posix.c
+index 52925c23d3..9369b312a0 100644
+--- a/os-posix.c
++++ b/os-posix.c
+@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
+}
+
+
+-/*
+- * Must set all three of these at once.
+- * Legal combinations are unset by name by uid
+- */
+-static struct passwd *user_pwd; /* NULL non-NULL NULL */
+-static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
+-static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
+-
+/*
+* Prepare to change user ID. user_id can be one of 3 forms:
+* - a username, in which case user ID will be changed to its uid,
+diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
+index 77477c1cd5..987977ead9 100644
+--- a/util/qemu-sockets.c
++++ b/util/qemu-sockets.c
+@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
+#endif
+}
+
++/*
++ * Must set all three of these at once.
++ * Legal combinations are unset by name by uid
++ */
++struct passwd *user_pwd; /* NULL non-NULL NULL */
++uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
++gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
++
+static int unix_listen_saddr(UnixSocketAddress *saddr,
+int num,
+Error **errp)
+@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
+error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
+goto err;
+}
++ if (user_pwd) {
++ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
+path);
++ goto err;
++ }
++ }
++ else if (user_uid != -1 && user_gid != -1) {
++ if (chown(un.sun_path, user_uid, user_gid) < 0) {
++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
+path);
++ goto err;
++ }
++ }
++
+if (listen(sock, num) < 0) {
+error_setg_errno(errp, errno, "Failed to listen on socket");
+goto err;
+--
+2.40.1
+
diff --git a/classification_output/01/mistranslation/1693040 b/classification_output/01/mistranslation/1693040
deleted file mode 100644
index 67353acd..00000000
--- a/classification_output/01/mistranslation/1693040
+++ /dev/null
@@ -1,1061 +0,0 @@
-mistranslation: 0.862
-semantic: 0.858
-instruction: 0.856
-other: 0.852
-
-[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang
-
-hi:
-
-yes.it is better.
-
-And should we delete 
-
-
-
-
-#ifdef WIN32
-
-    QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
-
-#endif
-
-
-
-
-in qio_channel_socket_acceptï¼
-
-qio_channel_socket_new already have it.
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992
-æéäººï¼ address@hidden address@hidden address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
-ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: ç­å¤: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼
-ï¼
-ï¼ index 13966f1..d65a0ea 100644
-ï¼
-ï¼
-ï¼ --- a/migration/socket.c
-ï¼
-ï¼
-ï¼ +++ b/migration/socket.c
-ï¼
-ï¼
-ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼
-ï¼
-ï¼       }
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       trace_migration_socket_incoming_accepted()
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-ï¼
-ï¼
-ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼
-ï¼
-ï¼       migration_channel_process_incoming(migrate_get_current(),
-ï¼
-ï¼
-ï¼                                          QIO_CHANNEL(sioc))
-ï¼
-ï¼
-ï¼       object_unref(OBJECT(sioc))
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ Is this patch ok?
-ï¼
-
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                            Error **errp)
-  {
-      QIOChannelSocket *cioc
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
--    cioc-ï¼fd = -1
-+
-+    cioc = qio_channel_socket_new()
-      cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
-      cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
-
-
-Thanks,
-Hailiang
-
-ï¼ I have test it . The test could not hang any more.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼
-ï¼
-ï¼
-ï¼ åä»¶äººï¼ address@hidden
-ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
-ï¼ æéäººï¼ address@hidden address@hidden address@hidden
-ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
-ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: [BUG]COLO failover hang
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
-ï¼ ï¼ï¼ Hi,
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
-failover,
-ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
-ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
-ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
-ï¼ ï¼ï¼ if we tried to cancel migration.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
-ï¼ ï¼ï¼      ... ...
-ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-ï¼ ï¼ï¼ and the
-ï¼ ï¼ï¼ migrate_fd_cancel()
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼   ... ...
-ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
-ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
-ï¼ ï¼ï¼      }
-ï¼ ï¼ï¼ }
-ï¼ ï¼
-ï¼ ï¼ (cc'd in Daniel Berrange).
-ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
-QIO_CHANNEL_FEATURE_SHUTDOWN) at the
-ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
-ï¼ ï¼
-ï¼
-ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-ï¼
-ï¼ ï¼ Dave
-ï¼ ï¼
-ï¼ ï¼ï¼ Thanks,
-ï¼ ï¼ï¼ Hailiang
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ Thank youã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I have test areadyã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
-placeã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node 
-qemu will not produce the problem,but Primary Node panic canã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I test a patch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        }
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
-"migration-socket-incoming")
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ My test will not hang any more.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
-ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
-ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  [BUG]COLO failover hang
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Hi,Wang.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ You can test this branch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-http://wiki.qemu-project.org/Features/COLO
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Thanks
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ hi.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
-ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ thank you.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
-ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
-ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
-ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
-outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
-errp=0x0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
-(errp=0x7f3d62bfaa48,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼
-ï¼ ï¼ --
-ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
-ï¼ ï¼
-ï¼ ï¼ .
-ï¼ ï¼
-ï¼
-
-On 2017/3/22 16:09, address@hidden wrote:
-hi:
-
-yes.it is better.
-
-And should we delete
-Yes, you are right.
-#ifdef WIN32
-
-     QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
-
-#endif
-
-
-
-
-in qio_channel_socket_acceptï¼
-
-qio_channel_socket_new already have it.
-
-
-
-
-
-
-
-
-
-
-
-
-åå§é®ä»¶
-
-
-
-åä»¶äººï¼ address@hidden
-æ¶ä»¶äººï¼çå¹¿10165992
-æéäººï¼ address@hidden address@hidden address@hidden address@hidden
-æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
-ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: ç­å¤: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼
-ï¼
-ï¼ index 13966f1..d65a0ea 100644
-ï¼
-ï¼
-ï¼ --- a/migration/socket.c
-ï¼
-ï¼
-ï¼ +++ b/migration/socket.c
-ï¼
-ï¼
-ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼
-ï¼
-ï¼       }
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       trace_migration_socket_incoming_accepted()
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-ï¼
-ï¼
-ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼
-ï¼
-ï¼       migration_channel_process_incoming(migrate_get_current(),
-ï¼
-ï¼
-ï¼                                          QIO_CHANNEL(sioc))
-ï¼
-ï¼
-ï¼       object_unref(OBJECT(sioc))
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ Is this patch ok?
-ï¼
-
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                             Error **errp)
-   {
-       QIOChannelSocket *cioc
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
--    cioc-ï¼fd = -1
-+
-+    cioc = qio_channel_socket_new()
-       cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
-       cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
-
-
-Thanks,
-Hailiang
-
-ï¼ I have test it . The test could not hang any more.
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ åå§é®ä»¶
-ï¼
-ï¼
-ï¼
-ï¼ åä»¶äººï¼ address@hidden
-ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
-ï¼ æéäººï¼ address@hidden address@hidden address@hidden
-ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
-ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: [BUG]COLO failover hang
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼
-ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
-ï¼ ï¼ï¼ Hi,
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
-failover,
-ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
-ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
-ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
-ï¼ ï¼ï¼ if we tried to cancel migration.
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
-ï¼ ï¼ï¼      ... ...
-ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-ï¼ ï¼ï¼ and the
-ï¼ ï¼ï¼ migrate_fd_cancel()
-ï¼ ï¼ï¼ {
-ï¼ ï¼ï¼   ... ...
-ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
-ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
-ï¼ ï¼ï¼      }
-ï¼ ï¼ï¼ }
-ï¼ ï¼
-ï¼ ï¼ (cc'd in Daniel Berrange).
-ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
-QIO_CHANNEL_FEATURE_SHUTDOWN) at the
-ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
-ï¼ ï¼
-ï¼
-ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-ï¼
-ï¼ ï¼ Dave
-ï¼ ï¼
-ï¼ ï¼ï¼ Thanks,
-ï¼ ï¼ï¼ Hailiang
-ï¼ ï¼ï¼
-ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ Thank youã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I have test areadyã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
-placeã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Incorrding
-http://wiki.qemu-project.org/Features/COLO
-ï¼kill Primary Node 
-qemu will not produce the problem,but Primary Node panic canã
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ I test a patch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        }
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
-"migration-socket-incoming")
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ My test will not hang any more.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
-ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
-ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
-ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
-ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  [BUG]COLO failover hang
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Hi,Wang.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ You can test this branch:
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-http://wiki.qemu-project.org/Features/COLO
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Thanks
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ hi.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
-ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
-ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
-ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
-ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ thank you.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden
-ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
-ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
-ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
-ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
-ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
-ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
-ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
-outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
-errp=0x0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
-(errp=0x7f3d62bfaa48,
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼ --
-ï¼ ï¼ï¼ï¼ ï¼ Thanks
-ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼ ï¼
-ï¼ ï¼ï¼ï¼
-ï¼ ï¼ï¼
-ï¼ ï¼ --
-ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
-ï¼ ï¼
-ï¼ ï¼ .
-ï¼ ï¼
-ï¼
-
diff --git a/classification_output/01/mistranslation/22219210 b/classification_output/01/mistranslation/22219210
new file mode 100644
index 00000000..95c3f61d
--- /dev/null
+++ b/classification_output/01/mistranslation/22219210
@@ -0,0 +1,43 @@
+mistranslation: 0.472
+semantic: 0.387
+other: 0.345
+instruction: 0.261
+
+[BUG][CPU hot-plug]CPU hot-plugs cause the qemu process to coredump
+
+Hello,Recently, when I was developing CPU hot-plugs under the loongarch
+architecture,
+I found that there was a problem with qemu cpu hot-plugs under x86
+architecture,
+which caused the qemu process coredump when repeatedly inserting and
+unplugging
+the CPU when the TCG was accelerated.
+
+
+The specific operation process is as follows:
+
+1.Use the following command to start the virtual machine
+
+qemu-system-x86_64 \
+-machine q35Â  \
+-cpu Broadwell-IBRS \
+-smp 1,maxcpus=4,sockets=4,cores=1,threads=1 \
+-m 4G \
+-drive file=~/anolis-8.8.qcow2Â  \
+-serial stdioÂ Â  \
+-monitor telnet:localhost:4498,server,nowait
+
+
+2.Enter QEMU Monitor via telnet for repeated CPU insertion and unplugging
+
+telnet 127.0.0.1 4498
+(qemu) device_add
+Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
+(qemu) device_del cpu1
+(qemu) device_add
+Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
+3.You will notice that the QEMU process has a coredump
+
+# malloc(): unsorted double linked list corrupted
+Aborted (core dumped)
+
diff --git a/classification_output/01/mistranslation/23270873 b/classification_output/01/mistranslation/23270873
new file mode 100644
index 00000000..e4d4789c
--- /dev/null
+++ b/classification_output/01/mistranslation/23270873
@@ -0,0 +1,692 @@
+mistranslation: 0.881
+other: 0.839
+instruction: 0.755
+semantic: 0.752
+
+[Qemu-devel] [BUG?] aio_get_linux_aio: Assertion `ctx->linux_aio' failed
+
+Hi,
+
+I am seeing some strange QEMU assertion failures for qemu on s390x,
+which prevents a guest from starting.
+
+Git bisecting points to the following commit as the source of the error.
+
+commit ed6e2161715c527330f936d44af4c547f25f687e
+Author: Nishanth Aravamudan <address@hidden>
+Date:   Fri Jun 22 12:37:00 2018 -0700
+
+    linux-aio: properly bubble up errors from initialization
+
+    laio_init() can fail for a couple of reasons, which will lead to a NULL
+    pointer dereference in laio_attach_aio_context().
+
+    To solve this, add a aio_setup_linux_aio() function which is called
+    early in raw_open_common. If this fails, propagate the error up. The
+    signature of aio_get_linux_aio() was not modified, because it seems
+    preferable to return the actual errno from the possible failing
+    initialization calls.
+
+    Additionally, when the AioContext changes, we need to associate a
+    LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
+    callback and call the new aio_setup_linux_aio(), which will allocate a
+new AioContext if needed, and return errors on failures. If it
+fails for
+any reason, fallback to threaded AIO with an error message, as the
+    device is already in-use by the guest.
+
+    Add an assert that aio_get_linux_aio() cannot return NULL.
+
+    Signed-off-by: Nishanth Aravamudan <address@hidden>
+    Message-id: address@hidden
+    Signed-off-by: Stefan Hajnoczi <address@hidden>
+Not sure what is causing this assertion to fail. Here is the qemu
+command line of the guest, from qemu log, which throws this error:
+LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
+QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name
+guest=rt_vm1,debug-threads=on -S -object
+secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes
+-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m
+1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
+iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d
+-display none -no-user-config -nodefaults -chardev
+socket,id=charmonitor,fd=28,server,nowait -mon
+chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
+-boot strict=on -drive
+file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
+-device
+virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
+-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
+virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000
+-netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
+virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002
+-chardev pty,id=charconsole0 -device
+sclpconsole,chardev=charconsole0,id=console0 -device
+virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox
+on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
+-msg timestamp=on
+2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges
+2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev
+pty,id=charconsole0: char device redirected to /dev/pts/3 (label
+charconsole0)
+qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion
+`ctx->linux_aio' failed.
+2018-07-17 15:48:43.309+0000: shutting down, reason=failed
+
+
+Any help debugging this would be greatly appreciated.
+
+Thank you
+Farhan
+
+On 17.07.2018 [13:25:53 -0400], Farhan Ali wrote:
+>
+Hi,
+>
+>
+I am seeing some strange QEMU assertion failures for qemu on s390x,
+>
+which prevents a guest from starting.
+>
+>
+Git bisecting points to the following commit as the source of the error.
+>
+>
+commit ed6e2161715c527330f936d44af4c547f25f687e
+>
+Author: Nishanth Aravamudan <address@hidden>
+>
+Date:   Fri Jun 22 12:37:00 2018 -0700
+>
+>
+linux-aio: properly bubble up errors from initialization
+>
+>
+laio_init() can fail for a couple of reasons, which will lead to a NULL
+>
+pointer dereference in laio_attach_aio_context().
+>
+>
+To solve this, add a aio_setup_linux_aio() function which is called
+>
+early in raw_open_common. If this fails, propagate the error up. The
+>
+signature of aio_get_linux_aio() was not modified, because it seems
+>
+preferable to return the actual errno from the possible failing
+>
+initialization calls.
+>
+>
+Additionally, when the AioContext changes, we need to associate a
+>
+LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
+>
+callback and call the new aio_setup_linux_aio(), which will allocate a
+>
+new AioContext if needed, and return errors on failures. If it fails for
+>
+any reason, fallback to threaded AIO with an error message, as the
+>
+device is already in-use by the guest.
+>
+>
+Add an assert that aio_get_linux_aio() cannot return NULL.
+>
+>
+Signed-off-by: Nishanth Aravamudan <address@hidden>
+>
+Message-id: address@hidden
+>
+Signed-off-by: Stefan Hajnoczi <address@hidden>
+>
+>
+>
+Not sure what is causing this assertion to fail. Here is the qemu command
+>
+line of the guest, from qemu log, which throws this error:
+>
+>
+>
+LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
+>
+QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name
+>
+guest=rt_vm1,debug-threads=on -S -object
+>
+secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes
+>
+-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m 1024
+>
+-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
+>
+iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d -display
+>
+none -no-user-config -nodefaults -chardev
+>
+socket,id=charmonitor,fd=28,server,nowait -mon
+>
+chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot
+>
+strict=on -drive
+>
+file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
+>
+-device
+>
+virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
+>
+-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
+>
+virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000
+>
+-netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
+>
+virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002
+>
+-chardev pty,id=charconsole0 -device
+>
+sclpconsole,chardev=charconsole0,id=console0 -device
+>
+virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox
+>
+on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg
+>
+timestamp=on
+>
+>
+>
+>
+2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges
+>
+2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev pty,id=charconsole0:
+>
+char device redirected to /dev/pts/3 (label charconsole0)
+>
+qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion
+>
+`ctx->linux_aio' failed.
+>
+2018-07-17 15:48:43.309+0000: shutting down, reason=failed
+>
+>
+>
+Any help debugging this would be greatly appreciated.
+iiuc, this possibly implies AIO was not actually used previously on this
+guest (it might have silently been falling back to threaded IO?). I
+don't have access to s390x, but would it be possible to run qemu under
+gdb and see if aio_setup_linux_aio is being called at all (I think it
+might not be, but I'm not sure why), and if so, if it's for the context
+in question?
+
+If it's not being called first, could you see what callpath is calling
+aio_get_linux_aio when this assertion trips?
+
+Thanks!
+-Nish
+
+On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
+iiuc, this possibly implies AIO was not actually used previously on this
+guest (it might have silently been falling back to threaded IO?). I
+don't have access to s390x, but would it be possible to run qemu under
+gdb and see if aio_setup_linux_aio is being called at all (I think it
+might not be, but I'm not sure why), and if so, if it's for the context
+in question?
+
+If it's not being called first, could you see what callpath is calling
+aio_get_linux_aio when this assertion trips?
+
+Thanks!
+-Nish
+Hi Nishant,
+From the coredump of the guest this is the call trace that calls
+aio_get_linux_aio:
+Stack trace of thread 145158:
+#0  0x000003ff94dbe274 raise (libc.so.6)
+#1  0x000003ff94da39a8 abort (libc.so.6)
+#2  0x000003ff94db62ce __assert_fail_base (libc.so.6)
+#3  0x000003ff94db634c __assert_fail (libc.so.6)
+#4  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
+#5  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
+#6  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
+#7  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
+#8  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
+#9  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
+#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
+#11 0x000003ff94f879a8 start_thread (libpthread.so.0)
+#12 0x000003ff94e797ee thread_start (libc.so.6)
+
+
+Thanks for taking a look and responding.
+
+Thanks
+Farhan
+
+On 07/18/2018 09:42 AM, Farhan Ali wrote:
+On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
+iiuc, this possibly implies AIO was not actually used previously on this
+guest (it might have silently been falling back to threaded IO?). I
+don't have access to s390x, but would it be possible to run qemu under
+gdb and see if aio_setup_linux_aio is being called at all (I think it
+might not be, but I'm not sure why), and if so, if it's for the context
+in question?
+
+If it's not being called first, could you see what callpath is calling
+aio_get_linux_aio when this assertion trips?
+
+Thanks!
+-Nish
+Hi Nishant,
+From the coredump of the guest this is the call trace that calls
+aio_get_linux_aio:
+Stack trace of thread 145158:
+#0Â  0x000003ff94dbe274 raise (libc.so.6)
+#1Â  0x000003ff94da39a8 abort (libc.so.6)
+#2Â  0x000003ff94db62ce __assert_fail_base (libc.so.6)
+#3Â  0x000003ff94db634c __assert_fail (libc.so.6)
+#4Â  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
+#5Â  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
+#6Â  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
+#7Â  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
+#8Â  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
+#9Â  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
+#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
+#11 0x000003ff94f879a8 start_thread (libpthread.so.0)
+#12 0x000003ff94e797ee thread_start (libc.so.6)
+
+
+Thanks for taking a look and responding.
+
+Thanks
+Farhan
+Trying to debug a little further, the block device in this case is a
+"host device". And looking at your commit carefully you use the
+bdrv_attach_aio_context callback to setup a Linux AioContext.
+For some reason the "host device" struct (BlockDriver bdrv_host_device
+in block/file-posix.c) does not have a bdrv_attach_aio_context defined.
+So a simple change of adding the callback to the struct solves the issue
+and the guest starts fine.
+diff --git a/block/file-posix.c b/block/file-posix.c
+index 28824aa..b8d59fb 100644
+--- a/block/file-posix.c
++++ b/block/file-posix.c
+@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
+     .bdrv_refresh_limits = raw_refresh_limits,
+     .bdrv_io_plug = raw_aio_plug,
+     .bdrv_io_unplug = raw_aio_unplug,
++    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
+
+     .bdrv_co_truncate       = raw_co_truncate,
+     .bdrv_getlength    = raw_getlength,
+I am not too familiar with block device code in QEMU, so not sure if
+this is the right fix or if there are some underlying problems.
+Thanks
+Farhan
+
+On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
+>
+>
+>
+On 07/18/2018 09:42 AM, Farhan Ali wrote:
+>
+>
+>
+>
+>
+> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
+>
+> > iiuc, this possibly implies AIO was not actually used previously on this
+>
+> > guest (it might have silently been falling back to threaded IO?). I
+>
+> > don't have access to s390x, but would it be possible to run qemu under
+>
+> > gdb and see if aio_setup_linux_aio is being called at all (I think it
+>
+> > might not be, but I'm not sure why), and if so, if it's for the context
+>
+> > in question?
+>
+> >
+>
+> > If it's not being called first, could you see what callpath is calling
+>
+> > aio_get_linux_aio when this assertion trips?
+>
+> >
+>
+> > Thanks!
+>
+> > -Nish
+>
+>
+>
+>
+>
+> Hi Nishant,
+>
+>
+>
+>  From the coredump of the guest this is the call trace that calls
+>
+> aio_get_linux_aio:
+>
+>
+>
+>
+>
+> Stack trace of thread 145158:
+>
+> #0Â  0x000003ff94dbe274 raise (libc.so.6)
+>
+> #1Â  0x000003ff94da39a8 abort (libc.so.6)
+>
+> #2Â  0x000003ff94db62ce __assert_fail_base (libc.so.6)
+>
+> #3Â  0x000003ff94db634c __assert_fail (libc.so.6)
+>
+> #4Â  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
+>
+> #5Â  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
+>
+> #6Â  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
+>
+> #7Â  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
+>
+> #8Â  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
+>
+> #9Â  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
+>
+> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
+>
+> #11 0x000003ff94f879a8 start_thread (libpthread.so.0)
+>
+> #12 0x000003ff94e797ee thread_start (libc.so.6)
+>
+>
+>
+>
+>
+> Thanks for taking a look and responding.
+>
+>
+>
+> Thanks
+>
+> Farhan
+>
+>
+>
+>
+>
+>
+>
+>
+Trying to debug a little further, the block device in this case is a "host
+>
+device". And looking at your commit carefully you use the
+>
+bdrv_attach_aio_context callback to setup a Linux AioContext.
+>
+>
+For some reason the "host device" struct (BlockDriver bdrv_host_device in
+>
+block/file-posix.c) does not have a bdrv_attach_aio_context defined.
+>
+So a simple change of adding the callback to the struct solves the issue and
+>
+the guest starts fine.
+>
+>
+>
+diff --git a/block/file-posix.c b/block/file-posix.c
+>
+index 28824aa..b8d59fb 100644
+>
+--- a/block/file-posix.c
+>
++++ b/block/file-posix.c
+>
+@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
+>
+.bdrv_refresh_limits = raw_refresh_limits,
+>
+.bdrv_io_plug = raw_aio_plug,
+>
+.bdrv_io_unplug = raw_aio_unplug,
+>
++    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
+>
+>
+.bdrv_co_truncate       = raw_co_truncate,
+>
+.bdrv_getlength    = raw_getlength,
+>
+>
+>
+>
+I am not too familiar with block device code in QEMU, so not sure if
+>
+this is the right fix or if there are some underlying problems.
+Oh this is quite embarassing! I only added the bdrv_attach_aio_context
+callback for the file-backed device. Your fix is definitely corect for
+host device. Let me make sure there weren't any others missed and I will
+send out a properly formatted patch. Thank you for the quick testing and
+turnaround!
+
+-Nish
+
+On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote:
+>
+On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
+>
+>
+>
+>
+>
+> On 07/18/2018 09:42 AM, Farhan Ali wrote:
+>
+>>
+>
+>>
+>
+>> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
+>
+>>> iiuc, this possibly implies AIO was not actually used previously on this
+>
+>>> guest (it might have silently been falling back to threaded IO?). I
+>
+>>> don't have access to s390x, but would it be possible to run qemu under
+>
+>>> gdb and see if aio_setup_linux_aio is being called at all (I think it
+>
+>>> might not be, but I'm not sure why), and if so, if it's for the context
+>
+>>> in question?
+>
+>>>
+>
+>>> If it's not being called first, could you see what callpath is calling
+>
+>>> aio_get_linux_aio when this assertion trips?
+>
+>>>
+>
+>>> Thanks!
+>
+>>> -Nish
+>
+>>
+>
+>>
+>
+>> Hi Nishant,
+>
+>>
+>
+>>  From the coredump of the guest this is the call trace that calls
+>
+>> aio_get_linux_aio:
+>
+>>
+>
+>>
+>
+>> Stack trace of thread 145158:
+>
+>> #0Â  0x000003ff94dbe274 raise (libc.so.6)
+>
+>> #1Â  0x000003ff94da39a8 abort (libc.so.6)
+>
+>> #2Â  0x000003ff94db62ce __assert_fail_base (libc.so.6)
+>
+>> #3Â  0x000003ff94db634c __assert_fail (libc.so.6)
+>
+>> #4Â  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
+>
+>> #5Â  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
+>
+>> #6Â  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
+>
+>> #7Â  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
+>
+>> #8Â  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
+>
+>> #9Â  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
+>
+>> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
+>
+>> #11 0x000003ff94f879a8 start_thread (libpthread.so.0)
+>
+>> #12 0x000003ff94e797ee thread_start (libc.so.6)
+>
+>>
+>
+>>
+>
+>> Thanks for taking a look and responding.
+>
+>>
+>
+>> Thanks
+>
+>> Farhan
+>
+>>
+>
+>>
+>
+>>
+>
+>
+>
+> Trying to debug a little further, the block device in this case is a "host
+>
+> device". And looking at your commit carefully you use the
+>
+> bdrv_attach_aio_context callback to setup a Linux AioContext.
+>
+>
+>
+> For some reason the "host device" struct (BlockDriver bdrv_host_device in
+>
+> block/file-posix.c) does not have a bdrv_attach_aio_context defined.
+>
+> So a simple change of adding the callback to the struct solves the issue and
+>
+> the guest starts fine.
+>
+>
+>
+>
+>
+> diff --git a/block/file-posix.c b/block/file-posix.c
+>
+> index 28824aa..b8d59fb 100644
+>
+> --- a/block/file-posix.c
+>
+> +++ b/block/file-posix.c
+>
+> @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
+>
+>      .bdrv_refresh_limits = raw_refresh_limits,
+>
+>      .bdrv_io_plug = raw_aio_plug,
+>
+>      .bdrv_io_unplug = raw_aio_unplug,
+>
+> +    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
+>
+>
+>
+>      .bdrv_co_truncate       = raw_co_truncate,
+>
+>      .bdrv_getlength    = raw_getlength,
+>
+>
+>
+>
+>
+>
+>
+> I am not too familiar with block device code in QEMU, so not sure if
+>
+> this is the right fix or if there are some underlying problems.
+>
+>
+Oh this is quite embarassing! I only added the bdrv_attach_aio_context
+>
+callback for the file-backed device. Your fix is definitely corect for
+>
+host device. Let me make sure there weren't any others missed and I will
+>
+send out a properly formatted patch. Thank you for the quick testing and
+>
+turnaround!
+Farhan, can you respin your patch with proper sign-off and patch description?
+Adding qemu-block.
+
+Hi Christian,
+
+On 19.07.2018 [08:55:20 +0200], Christian Borntraeger wrote:
+>
+>
+>
+On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote:
+>
+> On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
+>
+>>
+>
+>>
+>
+>> On 07/18/2018 09:42 AM, Farhan Ali wrote:
+<snip>
+
+>
+>> I am not too familiar with block device code in QEMU, so not sure if
+>
+>> this is the right fix or if there are some underlying problems.
+>
+>
+>
+> Oh this is quite embarassing! I only added the bdrv_attach_aio_context
+>
+> callback for the file-backed device. Your fix is definitely corect for
+>
+> host device. Let me make sure there weren't any others missed and I will
+>
+> send out a properly formatted patch. Thank you for the quick testing and
+>
+> turnaround!
+>
+>
+Farhan, can you respin your patch with proper sign-off and patch description?
+>
+Adding qemu-block.
+I sent it yesterday, sorry I didn't cc everyone from this e-mail:
+http://lists.nongnu.org/archive/html/qemu-block/2018-07/msg00516.html
+Thanks,
+Nish
+
diff --git a/classification_output/01/mistranslation/24930826 b/classification_output/01/mistranslation/24930826
new file mode 100644
index 00000000..5f79c452
--- /dev/null
+++ b/classification_output/01/mistranslation/24930826
@@ -0,0 +1,33 @@
+mistranslation: 0.637
+instruction: 0.555
+other: 0.535
+semantic: 0.487
+
+[Qemu-devel] [BUG] vhost-user: hot-unplug vhost-user nic for windows guest OS will fail with 100% reproduce rate
+
+Hi, guys
+
+I met a problem when hot-unplug vhost-user nic for Windows 2008 rc2 sp1 64 
+(Guest OS)
+
+The xml of nic is as followed:
+<interface type='vhostuser'>
+  <mac address='52:54:00:3b:83:aa'/>
+  <source type='unix' path='/var/run/vhost-user/port1' mode='client'/>
+  <target dev='port1'/>
+  <model type='virtio'/>
+  <driver queues='4'/>
+  <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
+</interface>
+
+Firstly, I use virsh attach-device win2008 vif.xml to hot-plug a nic for Guest 
+OS. This operation returns success.
+After guest OS discover nic successfully, I use virsh detach-device win2008 
+vif.xml to hot-unplug it. This operation will fail with 100% reproduce rate.
+
+However, if I hot-plug and hot-unplug virtio-net nic , it will not fail.
+
+I have analysis the process of qmp_device_del , I found that qemu have inject 
+interrupt to acpi to let it notice guest OS to remove nic.
+I guess there is something wrong in Windows when handle the interrupt.
+
diff --git a/classification_output/01/mistranslation/25842545 b/classification_output/01/mistranslation/25842545
new file mode 100644
index 00000000..1ebfe288
--- /dev/null
+++ b/classification_output/01/mistranslation/25842545
@@ -0,0 +1,202 @@
+mistranslation: 0.928
+other: 0.912
+instruction: 0.835
+semantic: 0.829
+
+[Qemu-devel] [Bug?] Guest pause because VMPTRLD failed in KVM
+
+Hello,
+
+  We encountered a problem that a guest paused because the KMOD report VMPTRLD 
+failed.
+
+The related information is as follows:
+
+1) Qemu command:
+   /usr/bin/qemu-kvm -name omu1 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu
+host -m 15625 -realtime mlock=off -smp 8,sockets=1,cores=8,threads=1 -uuid
+a2aacfff-6583-48b4-b6a4-e6830e519931 -no-user-config -nodefaults -chardev
+socket,id=charmonitor,path=/var/lib/libvirt/qemu/omu1.monitor,server,nowait
+-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
+-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
+file=/home/env/guest1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native
+  -device
+virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
+  -drive
+file=/home/env/guest_300G.img,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
+  -device
+virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
+  -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device
+virtio-net-pci,netdev=hostnet0,id=net0,mac=00:00:80:05:00:00,bus=pci.0,addr=0x3
+-netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device
+virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:80:05:00:01,bus=pci.0,addr=0x4
+-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
+-device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device
+cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
+virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
+
+   2) Qemu log:
+   KVM: entry failed, hardware error 0x4
+   RAX=00000000ffffffed RBX=ffff8803fa2d7fd8 RCX=0100000000000000
+RDX=0000000000000000
+   RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8803fa2d7e90
+RSP=ffff8803fa2efe90
+   R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
+R11=000000000000b69a
+   R12=0000000000000001 R13=ffffffff81a25b40 R14=0000000000000000
+R15=ffff8803fa2d7fd8
+   RIP=ffffffff81053e16 RFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
+   ES =0000 0000000000000000 ffffffff 00c00000
+   CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
+   SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
+   DS =0000 0000000000000000 ffffffff 00c00000
+   FS =0000 0000000000000000 ffffffff 00c00000
+   GS =0000 ffff88040f540000 ffffffff 00c00000
+   LDT=0000 0000000000000000 ffffffff 00c00000
+   TR =0040 ffff88040f550a40 00002087 00008b00 DPL=0 TSS64-busy
+   GDT=     ffff88040f549000 0000007f
+   IDT=     ffffffffff529000 00000fff
+   CR0=80050033 CR2=00007f81ca0c5000 CR3=00000003f5081000 CR4=000407e0
+   DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
+DR3=0000000000000000
+   DR6=00000000ffff0ff0 DR7=0000000000000400
+   EFER=0000000000000d01
+   Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ??
+?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
+
+   3) Demsg
+   [347315.028339] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
+   klogd 1.4.1, ---------- state change ----------
+   [347315.039506] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
+   [347315.051728] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
+   [347315.057472] vmwrite error: reg 6c0a value ffff88307e66e480 (err
+2120672384)
+   [347315.064567] Pid: 69523, comm: qemu-kvm Tainted: GF           X
+3.0.93-0.8-default #1
+   [347315.064569] Call Trace:
+   [347315.064587]  [<ffffffff810049d5>] dump_trace+0x75/0x300
+   [347315.064595]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
+   [347315.064617]  [<ffffffffa03738de>] vmx_vcpu_load+0x11e/0x1d0 [kvm_intel]
+   [347315.064647]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
+   [347315.064669]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
+   [347315.064676]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
+   [347315.064687]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+   [347315.064703]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
+   [347315.064732]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
+[kvm]
+   [347315.064759]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+   [347315.064771]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
+   [347315.064776]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
+   [347315.064783]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
+   [347315.064797]  [<00007fee51969ce7>] 0x7fee51969ce6
+   [347315.064799] vmwrite error: reg 6c0c value ffff88307e664000 (err
+2120630272)
+   [347315.064802] Pid: 69523, comm: qemu-kvm Tainted: GF           X
+3.0.93-0.8-default #1
+   [347315.064803] Call Trace:
+   [347315.064807]  [<ffffffff810049d5>] dump_trace+0x75/0x300
+   [347315.064811]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
+   [347315.064817]  [<ffffffffa03738ec>] vmx_vcpu_load+0x12c/0x1d0 [kvm_intel]
+   [347315.064832]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
+   [347315.064851]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
+   [347315.064855]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
+   [347315.064865]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+   [347315.064880]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
+   [347315.064907]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
+[kvm]
+   [347315.064933]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+   [347315.064943]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
+   [347315.064947]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
+   [347315.064951]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
+   [347315.064957]  [<00007fee51969ce7>] 0x7fee51969ce6
+   [347315.064959] vmwrite error: reg 6c10 value 0 (err 0)
+
+   4) The isssue can't be reporduced. I search the Intel VMX sepc about reaseons
+of vmptrld failure:
+   The instruction fails if its operand is not properly aligned, sets
+unsupported physical-address bits, or is equal to the VMXON
+   pointer. In addition, the instruction fails if the 32 bits in memory
+referenced by the operand do not match the VMCS
+   revision identifier supported by this processor.
+
+   But I can't find any cues from the KVM source code. It seems each
+   error conditions is impossible in theory. :(
+
+Any suggestions will be appreciated! Paolo?
+
+-- 
+Regards,
+-Gonglei
+
+On 10/11/2016 15:10, gong lei wrote:
+>
+4) The isssue can't be reporduced. I search the Intel VMX sepc about
+>
+reaseons
+>
+of vmptrld failure:
+>
+The instruction fails if its operand is not properly aligned, sets
+>
+unsupported physical-address bits, or is equal to the VMXON
+>
+pointer. In addition, the instruction fails if the 32 bits in memory
+>
+referenced by the operand do not match the VMCS
+>
+revision identifier supported by this processor.
+>
+>
+But I can't find any cues from the KVM source code. It seems each
+>
+error conditions is impossible in theory. :(
+Yes, it should not happen. :(
+
+If it's not reproducible, it's really hard to say what it was, except a
+random memory corruption elsewhere or even a bit flip (!).
+
+Paolo
+
+On 2016/11/17 20:39, Paolo Bonzini wrote:
+>
+>
+On 10/11/2016 15:10, gong lei wrote:
+>
+>     4) The isssue can't be reporduced. I search the Intel VMX sepc about
+>
+> reaseons
+>
+> of vmptrld failure:
+>
+>     The instruction fails if its operand is not properly aligned, sets
+>
+> unsupported physical-address bits, or is equal to the VMXON
+>
+>     pointer. In addition, the instruction fails if the 32 bits in memory
+>
+> referenced by the operand do not match the VMCS
+>
+>     revision identifier supported by this processor.
+>
+>
+>
+>     But I can't find any cues from the KVM source code. It seems each
+>
+>     error conditions is impossible in theory. :(
+>
+Yes, it should not happen. :(
+>
+>
+If it's not reproducible, it's really hard to say what it was, except a
+>
+random memory corruption elsewhere or even a bit flip (!).
+>
+>
+Paolo
+Thanks for your reply, Paolo :)
+
+-- 
+Regards,
+-Gonglei
+
diff --git a/classification_output/01/mistranslation/26430026 b/classification_output/01/mistranslation/26430026
new file mode 100644
index 00000000..ead1f32f
--- /dev/null
+++ b/classification_output/01/mistranslation/26430026
@@ -0,0 +1,165 @@
+mistranslation: 0.915
+semantic: 0.904
+instruction: 0.888
+other: 0.813
+
+[BUG] cxl,i386: e820 mappings may not be correct for cxl
+
+Context included below from prior discussion
+    - `cxl create-region` would fail on inability to allocate memory
+    - traced this down to the memory region being marked RESERVED
+    - E820 map marks the CXL fixed memory window as RESERVED
+
+
+Re: x86 errors, I found that region worked with this patch. (I also
+added the SRAT patches the Davidlohr posted, but I do not think they are
+relevant).
+
+I don't think this is correct, and setting this to E820_RAM causes the
+system to fail to boot at all, but with this change `cxl create-region`
+succeeds, which suggests our e820 mappings in the i386 machine are
+incorrect.
+
+Anyone who can help or have an idea as to what e820 should actually be
+doing with this region, or if this is correct and something else is
+failing, please help!
+
+
+diff --git a/hw/i386/pc.c b/hw/i386/pc.c
+index 566accf7e6..a5e688a742 100644
+--- a/hw/i386/pc.c
++++ b/hw/i386/pc.c
+@@ -1077,7 +1077,7 @@ void pc_memory_init(PCMachineState *pcms,
+                 memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
+                                       "cxl-fixed-memory-region", fw->size);
+                 memory_region_add_subregion(system_memory, fw->base, &fw->mr);
+-                e820_add_entry(fw->base, fw->size, E820_RESERVED);
++                e820_add_entry(fw->base, fw->size, E820_NVS);
+                 cxl_fmw_base += fw->size;
+                 cxl_resv_end = cxl_fmw_base;
+             }
+
+
+On Mon, Oct 10, 2022 at 05:32:42PM +0100, Jonathan Cameron wrote:
+>
+>
+> > but i'm not sure of what to do with this info.  We have some proof
+>
+> > that real hardware works with this no problem, and the only difference
+>
+> > is that the EFI/bios/firmware is setting the memory regions as `usable`
+>
+> > or `soft reserved`, which would imply the EDK2 is the blocker here
+>
+> > regardless of the OS driver status.
+>
+> >
+>
+> > But I'd seen elsewhere you had gotten some of this working, and I'm
+>
+> > failing to get anything working at the moment.  If you have any input i
+>
+> > would greatly appreciate the help.
+>
+> >
+>
+> > QEMU config:
+>
+> >
+>
+> > /opt/qemu-cxl2/bin/qemu-system-x86_64 \
+>
+> > -drive
+>
+> > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=d\
+>
+> > -m 2G,slots=4,maxmem=4G \
+>
+> > -smp 4 \
+>
+> > -machine type=q35,accel=kvm,cxl=on \
+>
+> > -enable-kvm \
+>
+> > -nographic \
+>
+> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
+>
+> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
+>
+> > -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=256M \
+>
+> > -object memory-backend-file,id=lsa0,mem-path=/tmp/cxl-lsa0,size=256M \
+>
+> > -device cxl-type3,bus=rp0,pmem=true,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0
+>
+> > \
+>
+> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=256M
+>
+> >
+>
+> > I'd seen on the lists that you had seen issues with single-rp setups,
+>
+> > but no combination of configuration I've tried (including all the ones
+>
+> > in the docs and tests) lead to a successful region creation with
+>
+> > `cxl create-region`
+>
+>
+>
+> Hmm. Let me have a play.  I've not run x86 tests for a while so
+>
+> perhaps something is missing there.
+>
+>
+>
+> I'm carrying a patch to override check_last_peer() in
+>
+> cxl_port_setup_targets() as that is wrong for some combinations,
+>
+> but that doesn't look like it's related to what you are seeing.
+>
+>
+I'm not sure if it's relevant, but turned out I'd forgotten I'm carrying 3
+>
+patches that aren't upstream (and one is a horrible hack).
+>
+>
+Hack:
+https://lore.kernel.org/linux-cxl/20220819094655.000005ed@huawei.com/
+>
+Shouldn't affect a simple case like this...
+>
+>
+https://lore.kernel.org/linux-cxl/20220819093133.00006c22@huawei.com/T/#t
+>
+(Dan's version)
+>
+>
+https://lore.kernel.org/linux-cxl/20220815154044.24733-1-Jonathan.Cameron@huawei.com/T/#t
+>
+>
+For writes to work you will currently need two rps (nothing on the second is
+>
+fine)
+>
+as we still haven't resolved if the kernel should support an HDM decoder on
+>
+a host bridge with one port.  I think it should (Spec allows it), others
+>
+unconvinced.
+>
+>
+Note I haven't shifted over to x86 yet so may still be something different
+>
+from
+>
+arm64.
+>
+>
+Jonathan
+>
+>
+
diff --git a/classification_output/01/mistranslation/36568044 b/classification_output/01/mistranslation/36568044
new file mode 100644
index 00000000..719c03c7
--- /dev/null
+++ b/classification_output/01/mistranslation/36568044
@@ -0,0 +1,4581 @@
+mistranslation: 0.962
+instruction: 0.930
+other: 0.930
+semantic: 0.923
+
+[BUG, RFC] cpr-transfer: qxl guest driver crashes after migration
+
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+>
+EMULATOR=/path/to/emulator
+>
+ROOTFS=/path/to/image
+>
+QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>
+$EMULATOR -enable-kvm \
+>
+-machine q35 \
+>
+-cpu host -smp 2 -m 2G \
+>
+-object
+>
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+>
+-machine memory-backend=ram0 \
+>
+-machine aux-ram-share=on \
+>
+-drive file=$ROOTFS,media=disk,if=virtio \
+>
+-qmp unix:$QMPSOCK,server=on,wait=off \
+>
+-nographic \
+>
+-device qxl-vga
+Run migration target:
+>
+EMULATOR=/path/to/emulator
+>
+ROOTFS=/path/to/image
+>
+QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>
+>
+>
+$EMULATOR -enable-kvm \
+>
+-machine q35 \
+>
+-cpu host -smp 2 -m 2G \
+>
+-object
+>
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+>
+-machine memory-backend=ram0 \
+>
+-machine aux-ram-share=on \
+>
+-drive file=$ROOTFS,media=disk,if=virtio \
+>
+-qmp unix:$QMPSOCK,server=on,wait=off \
+>
+-nographic \
+>
+-device qxl-vga \
+>
+-incoming tcp:0:44444 \
+>
+-incoming '{"channel-type": "cpr", "addr": { "transport": "socket",
+>
+"type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+>
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>
+$QMPSHELL -p $QMPSOCK <<EOF
+>
+migrate-set-parameters mode=cpr-transfer
+>
+migrate
+>
+channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
+>
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+>
+[   73.962002] [TTM] Buffer eviction failed
+>
+[   73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
+>
+[   73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
+>
+VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+>
+#!/bin/bash
+>
+>
+chvt 3
+>
+>
+for j in $(seq 80); do
+>
+echo "$(date) starting round $j"
+>
+if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != ""
+>
+]; then
+>
+echo "bug was reproduced after $j tries"
+>
+exit 1
+>
+fi
+>
+for i in $(seq 100); do
+>
+dmesg > /dev/tty3
+>
+done
+>
+done
+>
+>
+echo "bug could not be reproduced"
+>
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?  Any
+suggestions would be appreciated.  Thanks!
+
+Andrey
+
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+     -machine q35 \
+     -cpu host -smp 2 -m 2G \
+     -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+     -machine memory-backend=ram0 \
+     -machine aux-ram-share=on \
+     -drive file=$ROOTFS,media=disk,if=virtio \
+     -qmp unix:$QMPSOCK,server=on,wait=off \
+     -nographic \
+     -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+-machine q35 \
+     -cpu host -smp 2 -m 2G \
+     -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+     -machine memory-backend=ram0 \
+     -machine aux-ram-share=on \
+     -drive file=$ROOTFS,media=disk,if=virtio \
+     -qmp unix:$QMPSOCK,server=on,wait=off \
+     -nographic \
+     -device qxl-vga \
+     -incoming tcp:0:44444 \
+     -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", 
+"path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+     migrate-set-parameters mode=cpr-transfer
+     migrate 
+channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[   73.962002] [TTM] Buffer eviction failed
+[   73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
+[   73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
+VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+         echo "$(date) starting round $j"
+         if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" 
+]; then
+                 echo "bug was reproduced after $j tries"
+                 exit 1
+         fi
+         for i in $(seq 100); do
+                 dmesg > /dev/tty3
+         done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?  Any
+suggestions would be appreciated.  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+
+- Steve
+
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â  -machine q35 \
+Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â  -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â  -nographic \
+Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â  -machine q35 \
+Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â  -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â  -nographic \
+Â Â Â Â  -device qxl-vga \
+Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", 
+"path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â  migrate 
+channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
+VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" 
+]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.  A message is printed at migration start time.
+1740667681-257312-1-git-send-email-steven.sistare@oracle.com
+/">https://lore.kernel.org/qemu-devel/
+1740667681-257312-1-git-send-email-steven.sistare@oracle.com
+/
+- Steve
+
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>> Hi all,
+>
+>>
+>
+>> We've been experimenting with cpr-transfer migration mode recently and
+>
+>> have discovered the following issue with the guest QXL driver:
+>
+>>
+>
+>> Run migration source:
+>
+>>> EMULATOR=/path/to/emulator
+>
+>>> ROOTFS=/path/to/image
+>
+>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>
+>
+>>> $EMULATOR -enable-kvm \
+>
+>>> Â Â Â Â  -machine q35 \
+>
+>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>> ram0,share=on\
+>
+>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>> Â Â Â Â  -nographic \
+>
+>>> Â Â Â Â  -device qxl-vga
+>
+>>
+>
+>> Run migration target:
+>
+>>> EMULATOR=/path/to/emulator
+>
+>>> ROOTFS=/path/to/image
+>
+>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>> $EMULATOR -enable-kvm \
+>
+>>> Â Â Â Â  -machine q35 \
+>
+>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>> ram0,share=on\
+>
+>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>> Â Â Â Â  -nographic \
+>
+>>> Â Â Â Â  -device qxl-vga \
+>
+>>> Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>> Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>
+>
+>>
+>
+>> Launch the migration:
+>
+>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>
+>
+>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>> Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>> Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>> {"channel-type":"cpr","addr":
+>
+>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>> dst.sock"}}]
+>
+>>> EOF
+>
+>>
+>
+>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>> following messages:
+>
+>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>> 0x00000001)
+>
+>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>> allocate VRAM BO
+>
+>>
+>
+>> That seems to be a known kernel QXL driver bug:
+>
+>>
+>
+>>
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+>
+>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>
+>
+>> (the latter discussion contains that reproduce script which speeds up
+>
+>> the crash in the guest):
+>
+>>> #!/bin/bash
+>
+>>>
+>
+>>> chvt 3
+>
+>>>
+>
+>>> for j in $(seq 80); do
+>
+>>> Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>> Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+>
+>>> BO")" != "" ]; then
+>
+>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>> Â Â Â Â Â Â Â Â  fi
+>
+>>> Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>> Â Â Â Â Â Â Â Â  done
+>
+>>> done
+>
+>>>
+>
+>>> echo "bug could not be reproduced"
+>
+>>> exit 0
+>
+>>
+>
+>> The bug itself seems to remain unfixed, as I was able to reproduce that
+>
+>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>> without the cpr-transfer migration the above reproduce doesn't lead to
+>
+>> crash on the source VM.
+>
+>>
+>
+>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+>
+>> rather passes it through the memory backend object, our code might
+>
+>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>> corruption so far.
+>
+>>
+>
+>> Could somebody help the investigation and take a look into this?Â  Any
+>
+>> suggestions would be appreciated.Â  Thanks!
+>
+>
+>
+> Possibly some memory region created by qxl is not being preserved.
+>
+> Try adding these traces to see what is preserved:
+>
+>
+>
+> -trace enable='*cpr*'
+>
+> -trace enable='*ram_alloc*'
+>
+>
+Also try adding this patch to see if it flags any ram blocks as not
+>
+compatible with cpr.Â  A message is printed at migration start time.
+>
+Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
+>
+steven.sistare@oracle.com/
+>
+>
+- Steve
+>
+With the traces enabled + the "migration: ram block cpr blockers" patch
+applied:
+
+Source:
+>
+cpr_find_fd pc.bios, id 0 returns -1
+>
+cpr_save_fd pc.bios, id 0, fd 22
+>
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+0x7fec18e00000
+>
+cpr_find_fd pc.rom, id 0 returns -1
+>
+cpr_save_fd pc.rom, id 0, fd 23
+>
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+0x7fec18c00000
+>
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+24 host 0x7fec18a00000
+>
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+fd 25 host 0x7feb77e00000
+>
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27
+>
+host 0x7fec18800000
+>
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+fd 28 host 0x7feb73c00000
+>
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34
+>
+host 0x7fec18600000
+>
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35
+>
+host 0x7fec18200000
+>
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36
+>
+host 0x7feb8b600000
+>
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host
+>
+0x7feb8b400000
+>
+>
+cpr_state_save cpr-transfer mode
+>
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+>
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+cpr_state_load cpr-transfer mode
+>
+cpr_find_fd pc.bios, id 0 returns 20
+>
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+0x7fcdc9800000
+>
+cpr_find_fd pc.rom, id 0 returns 19
+>
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+0x7fcdc9600000
+>
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+18 host 0x7fcdc9400000
+>
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+fd 17 host 0x7fcd27e00000
+>
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16
+>
+host 0x7fcdc9200000
+>
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+fd 15 host 0x7fcd23c00000
+>
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14
+>
+host 0x7fcdc8800000
+>
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13
+>
+host 0x7fcdc8400000
+>
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11
+>
+host 0x7fcdc8200000
+>
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host
+>
+0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+addresses), and no incompatible ram blocks are found during migration.
+
+Andrey
+
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+>
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+> On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>>> Hi all,
+>
+>>>
+>
+>>> We've been experimenting with cpr-transfer migration mode recently and
+>
+>>> have discovered the following issue with the guest QXL driver:
+>
+>>>
+>
+>>> Run migration source:
+>
+>>>> EMULATOR=/path/to/emulator
+>
+>>>> ROOTFS=/path/to/image
+>
+>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>
+>
+>>>> $EMULATOR -enable-kvm \
+>
+>>>> Â Â Â Â  -machine q35 \
+>
+>>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>> ram0,share=on\
+>
+>>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>> Â Â Â Â  -nographic \
+>
+>>>> Â Â Â Â  -device qxl-vga
+>
+>>>
+>
+>>> Run migration target:
+>
+>>>> EMULATOR=/path/to/emulator
+>
+>>>> ROOTFS=/path/to/image
+>
+>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>>> $EMULATOR -enable-kvm \
+>
+>>>> Â Â Â Â  -machine q35 \
+>
+>>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>> ram0,share=on\
+>
+>>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>> Â Â Â Â  -nographic \
+>
+>>>> Â Â Â Â  -device qxl-vga \
+>
+>>>> Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>>> Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>>
+>
+>>>
+>
+>>> Launch the migration:
+>
+>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>
+>
+>>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>>> Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>>> Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>>> {"channel-type":"cpr","addr":
+>
+>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>>> dst.sock"}}]
+>
+>>>> EOF
+>
+>>>
+>
+>>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>>> following messages:
+>
+>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>>> 0x00000001)
+>
+>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>>> allocate VRAM BO
+>
+>>>
+>
+>>> That seems to be a known kernel QXL driver bug:
+>
+>>>
+>
+>>>
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+>
+>>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>>
+>
+>>> (the latter discussion contains that reproduce script which speeds up
+>
+>>> the crash in the guest):
+>
+>>>> #!/bin/bash
+>
+>>>>
+>
+>>>> chvt 3
+>
+>>>>
+>
+>>>> for j in $(seq 80); do
+>
+>>>> Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>>> Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+>
+>>>> BO")" != "" ]; then
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>>> Â Â Â Â Â Â Â Â  fi
+>
+>>>> Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>>> Â Â Â Â Â Â Â Â  done
+>
+>>>> done
+>
+>>>>
+>
+>>>> echo "bug could not be reproduced"
+>
+>>>> exit 0
+>
+>>>
+>
+>>> The bug itself seems to remain unfixed, as I was able to reproduce that
+>
+>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>>> without the cpr-transfer migration the above reproduce doesn't lead to
+>
+>>> crash on the source VM.
+>
+>>>
+>
+>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+>
+>>> rather passes it through the memory backend object, our code might
+>
+>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>>> corruption so far.
+>
+>>>
+>
+>>> Could somebody help the investigation and take a look into this?Â  Any
+>
+>>> suggestions would be appreciated.Â  Thanks!
+>
+>>
+>
+>> Possibly some memory region created by qxl is not being preserved.
+>
+>> Try adding these traces to see what is preserved:
+>
+>>
+>
+>> -trace enable='*cpr*'
+>
+>> -trace enable='*ram_alloc*'
+>
+>
+>
+> Also try adding this patch to see if it flags any ram blocks as not
+>
+> compatible with cpr.Â  A message is printed at migration start time.
+>
+> Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
+>
+> steven.sistare@oracle.com/
+>
+>
+>
+> - Steve
+>
+>
+>
+>
+With the traces enabled + the "migration: ram block cpr blockers" patch
+>
+applied:
+>
+>
+Source:
+>
+> cpr_find_fd pc.bios, id 0 returns -1
+>
+> cpr_save_fd pc.bios, id 0, fd 22
+>
+> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+> 0x7fec18e00000
+>
+> cpr_find_fd pc.rom, id 0 returns -1
+>
+> cpr_save_fd pc.rom, id 0, fd 23
+>
+> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+> 0x7fec18c00000
+>
+> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+> 24 host 0x7fec18a00000
+>
+> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+> fd 25 host 0x7feb77e00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27
+>
+> host 0x7fec18800000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+> fd 28 host 0x7feb73c00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34
+>
+> host 0x7fec18600000
+>
+> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd
+>
+> 35 host 0x7fec18200000
+>
+> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36
+>
+> host 0x7feb8b600000
+>
+> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host
+>
+> 0x7feb8b400000
+>
+>
+>
+> cpr_state_save cpr-transfer mode
+>
+> cpr_transfer_output /var/run/alma8cpr-dst.sock
+>
+>
+Target:
+>
+> cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+> cpr_state_load cpr-transfer mode
+>
+> cpr_find_fd pc.bios, id 0 returns 20
+>
+> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+> 0x7fcdc9800000
+>
+> cpr_find_fd pc.rom, id 0 returns 19
+>
+> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+> 0x7fcdc9600000
+>
+> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+> 18 host 0x7fcdc9400000
+>
+> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+> fd 17 host 0x7fcd27e00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16
+>
+> host 0x7fcdc9200000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+> fd 15 host 0x7fcd23c00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14
+>
+> host 0x7fcdc8800000
+>
+> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd
+>
+> 13 host 0x7fcdc8400000
+>
+> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11
+>
+> host 0x7fcdc8200000
+>
+> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host
+>
+> 0x7fcd3be00000
+>
+>
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+>
+addresses), and no incompatible ram blocks are found during migration.
+>
+Sorry, addressed are not the same, of course.  However corresponding ram
+blocks do seem to be preserved and initialized.
+
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+ Â Â Â Â  -machine q35 \
+ Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â  -nographic \
+ Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+ Â Â Â Â  -machine q35 \
+ Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â  -nographic \
+ Â Â Â Â  -device qxl-vga \
+ Â Â Â Â  -incoming tcp:0:44444 \
+ Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+ Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+ Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+ Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+ Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+BO")" != "" ]; then
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+ Â Â Â Â Â Â Â Â  fi
+ Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+ Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+ Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers" patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host 
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host 
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 24 
+host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd 
+25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 host 
+0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd 
+28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 host 
+0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35 
+host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 host 
+0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host 
+0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host 
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host 
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 18 
+host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd 
+17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 host 
+0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd 
+15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 host 
+0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13 
+host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 host 
+0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host 
+0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.  However corresponding ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+  qemu_ram_alloc_internal()
+    if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+        ram_flags |= RAM_READONLY;
+    new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+0001-hw-qxl-cpr-support-preliminary.patch
+Description:
+Text document
+
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+>
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+>
+> On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+>
+>> On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+>>> On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>>>>> Hi all,
+>
+>>>>>
+>
+>>>>> We've been experimenting with cpr-transfer migration mode recently
+>
+>>>>> and
+>
+>>>>> have discovered the following issue with the guest QXL driver:
+>
+>>>>>
+>
+>>>>> Run migration source:
+>
+>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>> ROOTFS=/path/to/image
+>
+>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>
+>
+>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>> Â Â Â Â Â  -machine q35 \
+>
+>>>>>> Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>> Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>>>> ram0,share=on\
+>
+>>>>>> Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>> Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>> Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>> Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>> Â Â Â Â Â  -nographic \
+>
+>>>>>> Â Â Â Â Â  -device qxl-vga
+>
+>>>>>
+>
+>>>>> Run migration target:
+>
+>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>> ROOTFS=/path/to/image
+>
+>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>> Â Â Â Â Â  -machine q35 \
+>
+>>>>>> Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>> Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>>>> ram0,share=on\
+>
+>>>>>> Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>> Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>> Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>> Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>> Â Â Â Â Â  -nographic \
+>
+>>>>>> Â Â Â Â Â  -device qxl-vga \
+>
+>>>>>> Â Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>>>>> Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>>>>
+>
+>>>>>
+>
+>>>>> Launch the migration:
+>
+>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>
+>
+>>>>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>>>>> Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>>>>> Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>>>>> {"channel-type":"cpr","addr":
+>
+>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>>>>> dst.sock"}}]
+>
+>>>>>> EOF
+>
+>>>>>
+>
+>>>>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>>>>> following messages:
+>
+>>>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>>>>> 0x00000001)
+>
+>>>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>>>>> allocate VRAM BO
+>
+>>>>>
+>
+>>>>> That seems to be a known kernel QXL driver bug:
+>
+>>>>>
+>
+>>>>>
+https://lore.kernel.org/all/20220907094423.93581-1-
+>
+>>>>> min_halo@163.com/T/
+>
+>>>>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>>>>
+>
+>>>>> (the latter discussion contains that reproduce script which speeds up
+>
+>>>>> the crash in the guest):
+>
+>>>>>> #!/bin/bash
+>
+>>>>>>
+>
+>>>>>> chvt 3
+>
+>>>>>>
+>
+>>>>>> for j in $(seq 80); do
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+>
+>>>>>> BO")" != "" ]; then
+>
+>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  fi
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  done
+>
+>>>>>> done
+>
+>>>>>>
+>
+>>>>>> echo "bug could not be reproduced"
+>
+>>>>>> exit 0
+>
+>>>>>
+>
+>>>>> The bug itself seems to remain unfixed, as I was able to reproduce
+>
+>>>>> that
+>
+>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>>>>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>>>>> without the cpr-transfer migration the above reproduce doesn't
+>
+>>>>> lead to
+>
+>>>>> crash on the source VM.
+>
+>>>>>
+>
+>>>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+>
+>>>>> rather passes it through the memory backend object, our code might
+>
+>>>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>>>>> corruption so far.
+>
+>>>>>
+>
+>>>>> Could somebody help the investigation and take a look into this?Â  Any
+>
+>>>>> suggestions would be appreciated.Â  Thanks!
+>
+>>>>
+>
+>>>> Possibly some memory region created by qxl is not being preserved.
+>
+>>>> Try adding these traces to see what is preserved:
+>
+>>>>
+>
+>>>> -trace enable='*cpr*'
+>
+>>>> -trace enable='*ram_alloc*'
+>
+>>>
+>
+>>> Also try adding this patch to see if it flags any ram blocks as not
+>
+>>> compatible with cpr.Â  A message is printed at migration start time.
+>
+>>> Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+>
+>>> email-
+>
+>>> steven.sistare@oracle.com/
+>
+>>>
+>
+>>> - Steve
+>
+>>>
+>
+>>
+>
+>> With the traces enabled + the "migration: ram block cpr blockers" patch
+>
+>> applied:
+>
+>>
+>
+>> Source:
+>
+>>> cpr_find_fd pc.bios, id 0 returns -1
+>
+>>> cpr_save_fd pc.bios, id 0, fd 22
+>
+>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+>>> 0x7fec18e00000
+>
+>>> cpr_find_fd pc.rom, id 0 returns -1
+>
+>>> cpr_save_fd pc.rom, id 0, fd 23
+>
+>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+>>> 0x7fec18c00000
+>
+>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>> 262144 fd 24 host 0x7fec18a00000
+>
+>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>> 67108864 fd 25 host 0x7feb77e00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>> fd 27 host 0x7fec18800000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>> 67108864 fd 28 host 0x7feb73c00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>> fd 34 host 0x7fec18600000
+>
+>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>> 2097152 fd 35 host 0x7fec18200000
+>
+>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>> fd 36 host 0x7feb8b600000
+>
+>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>> 37 host 0x7feb8b400000
+>
+>>>
+>
+>>> cpr_state_save cpr-transfer mode
+>
+>>> cpr_transfer_output /var/run/alma8cpr-dst.sock
+>
+>>
+>
+>> Target:
+>
+>>> cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+>>> cpr_state_load cpr-transfer mode
+>
+>>> cpr_find_fd pc.bios, id 0 returns 20
+>
+>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+>>> 0x7fcdc9800000
+>
+>>> cpr_find_fd pc.rom, id 0 returns 19
+>
+>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+>>> 0x7fcdc9600000
+>
+>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>> 262144 fd 18 host 0x7fcdc9400000
+>
+>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>> 67108864 fd 17 host 0x7fcd27e00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>> fd 16 host 0x7fcdc9200000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>> 67108864 fd 15 host 0x7fcd23c00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>> fd 14 host 0x7fcdc8800000
+>
+>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>> 2097152 fd 13 host 0x7fcdc8400000
+>
+>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>> fd 11 host 0x7fcdc8200000
+>
+>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>> 10 host 0x7fcd3be00000
+>
+>>
+>
+>> Looks like both vga.vram and qxl.vram are being preserved (with the same
+>
+>> addresses), and no incompatible ram blocks are found during migration.
+>
+>
+>
+> Sorry, addressed are not the same, of course.Â  However corresponding ram
+>
+> blocks do seem to be preserved and initialized.
+>
+>
+So far, I have not reproduced the guest driver failure.
+>
+>
+However, I have isolated places where new QEMU improperly writes to
+>
+the qxl memory regions prior to starting the guest, by mmap'ing them
+>
+readonly after cpr:
+>
+>
+Â  qemu_ram_alloc_internal()
+>
+Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+>
+Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+>
+Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+>
+>
+I have attached a draft fix; try it and let me know.
+>
+My console window looks fine before and after cpr, using
+>
+-vnc $hostip:0 -vga qxl
+>
+>
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.  Could it
+happen on your stand as well?  Could you try launching VM with
+"-nographic -device qxl-vga"?  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+
+As for your patch, I can report that it doesn't resolve the issue as it
+is.  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+
+>
+Program terminated with signal SIGSEGV, Segmentation fault.
+>
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+412         d->ram->magic       = cpu_to_le32(QXL_RAM_MAGIC);
+>
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+>
+(gdb) bt
+>
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+#1  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+>
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+>
+#2  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+>
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+>
+#3  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+>
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+>
+#4  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+>
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+>
+#5  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+>
+v=0x5638996f3770, name=0x56389759b141 "realized", opaque=0x5638987893d0,
+>
+errp=0x7ffd3c2b84e0)
+>
+at ../qom/object.c:2374
+>
+#6  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+>
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+>
+at ../qom/object.c:1449
+>
+#7  0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70,
+>
+name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0)
+>
+at ../qom/qom-qobject.c:28
+>
+#8  0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70,
+>
+name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0)
+>
+at ../qom/object.c:1519
+>
+#9  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+>
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+>
+#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50,
+>
+from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714
+>
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+>
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+>
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150,
+>
+errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207
+>
+#13 0x000056389737a6cc in qemu_opts_foreach
+>
+(list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+>
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+>
+at ../util/qemu-option.c:1135
+>
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745
+>
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+>
+<error_fatal>) at ../system/vl.c:2806
+>
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at
+>
+../system/vl.c:3838
+>
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at
+>
+../system/main.c:72
+So the attached adjusted version of your patch does seem to help.  At
+least I can't reproduce the crash on my stand.
+
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+
+Andrey
+0001-hw-qxl-cpr-support-preliminary.patch
+Description:
+Text Data
+
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â  -nographic \
+ Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â  -nographic \
+ Â Â Â Â Â  -device qxl-vga \
+ Â Â Â Â Â  -incoming tcp:0:44444 \
+ Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+ Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+ Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+ Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+ Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+BO")" != "" ]; then
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+ Â Â Â Â Â Â Â Â Â  fi
+ Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+ Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+ Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers" patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+ Â  qemu_ram_alloc_internal()
+ Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+ Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+ Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.  To test, I specify
+port 0 for the source VM and port 1 for the dest.  When the src vnc goes
+dormant the dest vnc becomes active.
+Could you try launching VM with
+"-nographic -device qxl-vga"?  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver crash,
+and I suspect my guest image+kernel is too old.  However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412         d->ram->magic       = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, 
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, 
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, 
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, value=true, 
+errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, v=0x5638996f3770, 
+name=0x56389759b141 "realized", opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+     at ../qom/object.c:2374
+#6  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, name=0x56389759b141 
+"realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+     at ../qom/object.c:1449
+#7  0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70, 
+name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0)
+     at ../qom/qom-qobject.c:28
+#8  0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70, 
+name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0)
+     at ../qom/object.c:1519
+#9  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, bus=0x563898cf3c20, 
+errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50, 
+from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, errp=0x56389855dc40 
+<error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150, 
+errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca <device_init_func>, 
+opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+     at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at 
+../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at 
+../system/main.c:72
+So the attached adjusted version of your patch does seem to help.  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram are
+definitely harmful.  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large memory 
+region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is guaranteed to
+be the same:
+  qxl_post_load()
+    qxl_set_mode()
+      d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in vmstate
+from old qemu.
+
+- Steve
+0001-hw-qxl-cpr-support-preliminary-V2.patch
+Description:
+Text document
+
+On 3/5/25 22:19, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â  -object
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â  -object
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â  -device qxl-vga \
+Â Â Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing
+the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR*
+failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to
+allocate VRAM
+BO")" != "" ]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the
+crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with
+the same
+addresses), and no incompatible ram blocks are found during
+migration.
+Sorry, addressed are not the same, of course.Â  However
+corresponding ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+Â Â  qemu_ram_alloc_internal()
+Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.Â  However, once I
+realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at
+../system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at
+../system/vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at
+../system/vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at
+../system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in
+init_qxl_ram are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+Good point. Though we could move this code under non-default option to
+avoid re-writing.
+
+Den
+
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+>
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+>
+> On 3/4/25 9:05 PM, Steven Sistare wrote:
+>
+>> On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+>
+>>> On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+>
+>>>> On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+>>>>> On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+>>>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>>>>>>> Hi all,
+>
+>>>>>>>
+>
+>>>>>>> We've been experimenting with cpr-transfer migration mode recently
+>
+>>>>>>> and
+>
+>>>>>>> have discovered the following issue with the guest QXL driver:
+>
+>>>>>>>
+>
+>>>>>>> Run migration source:
+>
+>>>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>>>> ROOTFS=/path/to/image
+>
+>>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>>>
+>
+>>>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine q35 \
+>
+>>>>>>>> Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>>>> Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+>
+>>>>>>>> dev/shm/
+>
+>>>>>>>> ram0,share=on\
+>
+>>>>>>>> Â Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>>>> Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>>>> Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>>>> Â Â Â Â Â Â  -nographic \
+>
+>>>>>>>> Â Â Â Â Â Â  -device qxl-vga
+>
+>>>>>>>
+>
+>>>>>>> Run migration target:
+>
+>>>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>>>> ROOTFS=/path/to/image
+>
+>>>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine q35 \
+>
+>>>>>>>> Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>>>> Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+>
+>>>>>>>> dev/shm/
+>
+>>>>>>>> ram0,share=on\
+>
+>>>>>>>> Â Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>>>> Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>>>> Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>>>> Â Â Â Â Â Â  -nographic \
+>
+>>>>>>>> Â Â Â Â Â Â  -device qxl-vga \
+>
+>>>>>>>> Â Â Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>>>>>>> Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>>>>>>
+>
+>>>>>>>
+>
+>>>>>>> Launch the migration:
+>
+>>>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>>>
+>
+>>>>>>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>>>>>>> Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>>>>>>> Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>>>>>>> {"channel-type":"cpr","addr":
+>
+>>>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>>>>>>> dst.sock"}}]
+>
+>>>>>>>> EOF
+>
+>>>>>>>
+>
+>>>>>>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>>>>>>> following messages:
+>
+>>>>>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>>>>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>>>>>>> 0x00000001)
+>
+>>>>>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>>>>>>> allocate VRAM BO
+>
+>>>>>>>
+>
+>>>>>>> That seems to be a known kernel QXL driver bug:
+>
+>>>>>>>
+>
+>>>>>>>
+https://lore.kernel.org/all/20220907094423.93581-1-
+>
+>>>>>>> min_halo@163.com/T/
+>
+>>>>>>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>>>>>>
+>
+>>>>>>> (the latter discussion contains that reproduce script which
+>
+>>>>>>> speeds up
+>
+>>>>>>> the crash in the guest):
+>
+>>>>>>>> #!/bin/bash
+>
+>>>>>>>>
+>
+>>>>>>>> chvt 3
+>
+>>>>>>>>
+>
+>>>>>>>> for j in $(seq 80); do
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+>
+>>>>>>>> VRAM
+>
+>>>>>>>> BO")" != "" ]; then
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  fi
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  done
+>
+>>>>>>>> done
+>
+>>>>>>>>
+>
+>>>>>>>> echo "bug could not be reproduced"
+>
+>>>>>>>> exit 0
+>
+>>>>>>>
+>
+>>>>>>> The bug itself seems to remain unfixed, as I was able to reproduce
+>
+>>>>>>> that
+>
+>>>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>>>>>>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>>>>>>> without the cpr-transfer migration the above reproduce doesn't
+>
+>>>>>>> lead to
+>
+>>>>>>> crash on the source VM.
+>
+>>>>>>>
+>
+>>>>>>> I suspect that, as cpr-transfer doesn't migrate the guest
+>
+>>>>>>> memory, but
+>
+>>>>>>> rather passes it through the memory backend object, our code might
+>
+>>>>>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>>>>>>> corruption so far.
+>
+>>>>>>>
+>
+>>>>>>> Could somebody help the investigation and take a look into
+>
+>>>>>>> this?Â  Any
+>
+>>>>>>> suggestions would be appreciated.Â  Thanks!
+>
+>>>>>>
+>
+>>>>>> Possibly some memory region created by qxl is not being preserved.
+>
+>>>>>> Try adding these traces to see what is preserved:
+>
+>>>>>>
+>
+>>>>>> -trace enable='*cpr*'
+>
+>>>>>> -trace enable='*ram_alloc*'
+>
+>>>>>
+>
+>>>>> Also try adding this patch to see if it flags any ram blocks as not
+>
+>>>>> compatible with cpr.Â  A message is printed at migration start time.
+>
+>>>>> Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+>
+>>>>> email-
+>
+>>>>> steven.sistare@oracle.com/
+>
+>>>>>
+>
+>>>>> - Steve
+>
+>>>>>
+>
+>>>>
+>
+>>>> With the traces enabled + the "migration: ram block cpr blockers"
+>
+>>>> patch
+>
+>>>> applied:
+>
+>>>>
+>
+>>>> Source:
+>
+>>>>> cpr_find_fd pc.bios, id 0 returns -1
+>
+>>>>> cpr_save_fd pc.bios, id 0, fd 22
+>
+>>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+>>>>> 0x7fec18e00000
+>
+>>>>> cpr_find_fd pc.rom, id 0 returns -1
+>
+>>>>> cpr_save_fd pc.rom, id 0, fd 23
+>
+>>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+>>>>> 0x7fec18c00000
+>
+>>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+>>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>>>> 262144 fd 24 host 0x7fec18a00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 25 host 0x7feb77e00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>>>> fd 27 host 0x7fec18800000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 28 host 0x7feb73c00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>>>> fd 34 host 0x7fec18600000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+>>>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>>>> 2097152 fd 35 host 0x7fec18200000
+>
+>>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+>>>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>>>> fd 36 host 0x7feb8b600000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+>>>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>>>> 37 host 0x7feb8b400000
+>
+>>>>>
+>
+>>>>> cpr_state_save cpr-transfer mode
+>
+>>>>> cpr_transfer_output /var/run/alma8cpr-dst.sock
+>
+>>>>
+>
+>>>> Target:
+>
+>>>>> cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+>>>>> cpr_state_load cpr-transfer mode
+>
+>>>>> cpr_find_fd pc.bios, id 0 returns 20
+>
+>>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+>>>>> 0x7fcdc9800000
+>
+>>>>> cpr_find_fd pc.rom, id 0 returns 19
+>
+>>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+>>>>> 0x7fcdc9600000
+>
+>>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+>>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>>>> 262144 fd 18 host 0x7fcdc9400000
+>
+>>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 17 host 0x7fcd27e00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>>>> fd 16 host 0x7fcdc9200000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 15 host 0x7fcd23c00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>>>> fd 14 host 0x7fcdc8800000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>>>> 2097152 fd 13 host 0x7fcdc8400000
+>
+>>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>>>> fd 11 host 0x7fcdc8200000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>>>> 10 host 0x7fcd3be00000
+>
+>>>>
+>
+>>>> Looks like both vga.vram and qxl.vram are being preserved (with the
+>
+>>>> same
+>
+>>>> addresses), and no incompatible ram blocks are found during migration.
+>
+>>>
+>
+>>> Sorry, addressed are not the same, of course.Â  However corresponding
+>
+>>> ram
+>
+>>> blocks do seem to be preserved and initialized.
+>
+>>
+>
+>> So far, I have not reproduced the guest driver failure.
+>
+>>
+>
+>> However, I have isolated places where new QEMU improperly writes to
+>
+>> the qxl memory regions prior to starting the guest, by mmap'ing them
+>
+>> readonly after cpr:
+>
+>>
+>
+>> Â Â  qemu_ram_alloc_internal()
+>
+>> Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+>
+>> Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+>
+>> Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+>
+>>
+>
+>> I have attached a draft fix; try it and let me know.
+>
+>> My console window looks fine before and after cpr, using
+>
+>> -vnc $hostip:0 -vga qxl
+>
+>>
+>
+>> - Steve
+>
+>
+>
+> Regarding the reproduce: when I launch the buggy version with the same
+>
+> options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+>
+> my VNC client silently hangs on the target after a while.Â  Could it
+>
+> happen on your stand as well?Â
+>
+>
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+>
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+>
+dormant the dest vnc becomes active.
+>
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+
+>
+> Could you try launching VM with
+>
+> "-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+>
+> directly in the shell, so when qxl driver crashes you're still able to
+>
+> inspect the kernel messages.
+>
+>
+I have been running like that, but have not reproduced the qxl driver
+>
+crash,
+>
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+
+
+>
+However, once I realized the
+>
+issue was post-cpr modification of qxl memory, I switched my attention
+>
+to the
+>
+fix.
+>
+>
+> As for your patch, I can report that it doesn't resolve the issue as it
+>
+> is.Â  But I was able to track down another possible memory corruption
+>
+> using your approach with readonly mmap'ing:
+>
+>
+>
+>> Program terminated with signal SIGSEGV, Segmentation fault.
+>
+>> #0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+>> 412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+>
+>> [Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+>
+>> (gdb) bt
+>
+>> #0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+>> #1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+>
+>> errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+>
+>> #2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+>
+>> errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+>
+>> #3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+>
+>> errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+>
+>> #4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+>
+>> value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+>
+>> #5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+>
+>> v=0x5638996f3770, name=0x56389759b141 "realized",
+>
+>> opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/object.c:2374
+>
+>> #6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+>
+>> name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/object.c:1449
+>
+>> #7Â  0x00005638970f8586 in object_property_set_qobject
+>
+>> (obj=0x5638996e0e70, name=0x56389759b141 "realized",
+>
+>> value=0x5638996df900, errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/qom-qobject.c:28
+>
+>> #8Â  0x00005638970f3d8d in object_property_set_bool
+>
+>> (obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+>
+>> errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/object.c:1519
+>
+>> #9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+>
+>> bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+>
+>> #10 0x0000563896dba675 in qdev_device_add_from_qdict
+>
+>> (opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+>
+>> system/qdev-monitor.c:714
+>
+>> #11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+>
+>> errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+>
+>> #12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+>
+>> opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+>
+>> vl.c:1207
+>
+>> #13 0x000056389737a6cc in qemu_opts_foreach
+>
+>> Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+>
+>> <device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+>
+>> Â Â Â Â  at ../util/qemu-option.c:1135
+>
+>> #14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+>
+>> vl.c:2745
+>
+>> #15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+>
+>> <error_fatal>) at ../system/vl.c:2806
+>
+>> #16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+>
+>> at ../system/vl.c:3838
+>
+>> #17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+>
+>> system/main.c:72
+>
+>
+>
+> So the attached adjusted version of your patch does seem to help.Â  At
+>
+> least I can't reproduce the crash on my stand.
+>
+>
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+>
+are
+>
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+>
+of init_qxl_ram that modify guest memory.
+>
+Thanks, your v2 patch does seem to prevent the crash.  Would you re-send
+it to the list as a proper fix?
+
+>
+> I'm wondering, could it be useful to explicitly mark all the reused
+>
+> memory regions readonly upon cpr-transfer, and then make them writable
+>
+> back again after the migration is done?Â  That way we will be segfaulting
+>
+> early on instead of debugging tricky memory corruptions.
+>
+>
+It's a useful debugging technique, but changing protection on a large
+>
+memory region
+>
+can be too expensive for production due to TLB shootdowns.
+>
+>
+Also, there are cases where writes are performed but the value is
+>
+guaranteed to
+>
+be the same:
+>
+Â  qxl_post_load()
+>
+Â Â Â  qxl_set_mode()
+>
+Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+>
+The value is the same because mode and shadow_rom.mode were passed in
+>
+vmstate
+>
+from old qemu.
+>
+There're also cases where devices' ROM might be re-initialized.  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+
+>
+Program terminated with signal SIGSEGV, Segmentation fault.
+>
+#0  __memmove_avx_unaligned_erms () at
+>
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+>
+664             rep     movsb
+>
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+>
+(gdb) bt
+>
+#0  __memmove_avx_unaligned_erms () at
+>
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+>
+#1  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380,
+>
+owner=0x55aa2019ac10, name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+>
+at ../hw/core/loader.c:1032
+>
+#2  0x000055aa1d031577 in rom_add_blob
+>
+(name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072,
+>
+max_len=2097152, addr=18446744073709551615, fw_file_name=0x55aa1da51f13
+>
+"etc/acpi/tables", fw_callback=0x55aa1d441f59 <acpi_build_update>,
+>
+callback_opaque=0x55aa20ff0010, as=0x0, read_only=true) at
+>
+../hw/core/loader.c:1147
+>
+#3  0x000055aa1cfd788d in acpi_add_rom_blob
+>
+(update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010,
+>
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at
+>
+../hw/acpi/utils.c:46
+>
+#4  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+>
+#5  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0)
+>
+at ../hw/i386/pc.c:638
+>
+#6  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10
+>
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+>
+#7  0x000055aa1d039ee5 in qdev_machine_creation_done () at
+>
+../hw/core/machine.c:1749
+>
+#8  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40
+>
+<error_fatal>) at ../system/vl.c:2779
+>
+#9  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40
+>
+<error_fatal>) at ../system/vl.c:2807
+>
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at
+>
+../system/vl.c:3838
+>
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at
+>
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+
+Andrey
+
+On 3/6/25 16:16, Andrey Drobyshev wrote:
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+ Â Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â Â  -nographic \
+ Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+ Â Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â Â  -nographic \
+ Â Â Â Â Â Â  -device qxl-vga \
+ Â Â Â Â Â Â  -incoming tcp:0:44444 \
+ Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+ Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+ Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+ Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+ Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+VRAM
+BO")" != "" ]; then
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+ Â Â Â Â Â Â Â Â Â Â  fi
+ Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+ Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+ Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the
+same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding
+ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+ Â Â  qemu_ram_alloc_internal()
+ Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+ Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+ Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+ Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+ Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+Thanks, your v2 patch does seem to prevent the crash.  Would you re-send
+it to the list as a proper fix?
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is
+guaranteed to
+be the same:
+ Â  qxl_post_load()
+ Â Â Â  qxl_set_mode()
+ Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in
+vmstate
+from old qemu.
+There're also cases where devices' ROM might be re-initialized.  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+664             rep     movsb
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+(gdb) bt
+#0  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+#1  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
+name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+     at ../hw/core/loader.c:1032
+#2  0x000055aa1d031577 in rom_add_blob
+     (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
+addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
+fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
+read_only=true) at ../hw/core/loader.c:1147
+#3  0x000055aa1cfd788d in acpi_add_rom_blob
+     (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
+#4  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+#5  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
+at ../hw/i386/pc.c:638
+#6  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+#7  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
+../hw/core/machine.c:1749
+#8  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2779
+#9  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2807
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
+../system/vl.c:3838
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+
+Andrey
+No way. ACPI with the source must be used in the same way as BIOSes
+and optional ROMs.
+
+Den
+
+On 3/6/2025 10:52 AM, Denis V. Lunev wrote:
+On 3/6/25 16:16, Andrey Drobyshev wrote:
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga \
+Â Â Â Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+VRAM
+BO")" != "" ]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+Â Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the
+same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding
+ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+Â Â Â  qemu_ram_alloc_internal()
+Â Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+Â Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+Â Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.Â  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+Â Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+Â Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+Thanks, your v2 patch does seem to prevent the crash.Â  Would you re-send
+it to the list as a proper fix?
+Yes.  Was waiting for your confirmation.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is
+guaranteed to
+be the same:
+Â Â  qxl_post_load()
+Â Â Â Â  qxl_set_mode()
+Â Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in
+vmstate
+from old qemu.
+There're also cases where devices' ROM might be re-initialized.Â  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+664Â Â Â Â Â Â Â Â Â Â Â Â  repÂ Â Â Â  movsb
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+(gdb) bt
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+#1Â  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
+name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+Â Â Â Â  at ../hw/core/loader.c:1032
+#2Â  0x000055aa1d031577 in rom_add_blob
+Â Â Â Â  (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
+addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
+fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
+read_only=true) at ../hw/core/loader.c:1147
+#3Â  0x000055aa1cfd788d in acpi_add_rom_blob
+Â Â Â Â  (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
+#4Â  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+#5Â  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
+at ../hw/i386/pc.c:638
+#6Â  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+#7Â  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
+../hw/core/machine.c:1749
+#8Â  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2779
+#9Â  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2807
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
+../system/vl.c:3838
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.Â  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.Â  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+I'll look into adding an option, but there may be too many false positives,
+such as the qxl_set_mode case above.  And the maintainers may object to me
+eliminating the false positives by adding more CPR_IN tests, due to gratuitous
+(from their POV) ugliness.
+
+But I will use the technique to look for more write violations.
+Andrey
+No way. ACPI with the source must be used in the same way as BIOSes
+and optional ROMs.
+Yup, its a bug.  Will fix.
+
+- Steve
+
+see
+1741380954-341079-1-git-send-email-steven.sistare@oracle.com
+/">https://lore.kernel.org/qemu-devel/
+1741380954-341079-1-git-send-email-steven.sistare@oracle.com
+/
+- Steve
+
+On 3/6/2025 11:13 AM, Steven Sistare wrote:
+On 3/6/2025 10:52 AM, Denis V. Lunev wrote:
+On 3/6/25 16:16, Andrey Drobyshev wrote:
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga \
+Â Â Â Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+VRAM
+BO")" != "" ]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+Â Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the
+same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding
+ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+Â Â Â  qemu_ram_alloc_internal()
+Â Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+Â Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+Â Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.Â  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+Â Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+Â Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+Thanks, your v2 patch does seem to prevent the crash.Â  Would you re-send
+it to the list as a proper fix?
+Yes.Â  Was waiting for your confirmation.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is
+guaranteed to
+be the same:
+Â Â  qxl_post_load()
+Â Â Â Â  qxl_set_mode()
+Â Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in
+vmstate
+from old qemu.
+There're also cases where devices' ROM might be re-initialized.Â  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+664Â Â Â Â Â Â Â Â Â Â Â Â  repÂ Â Â Â  movsb
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+(gdb) bt
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+#1Â  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
+name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+Â Â Â Â  at ../hw/core/loader.c:1032
+#2Â  0x000055aa1d031577 in rom_add_blob
+Â Â Â Â  (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
+addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
+fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
+read_only=true) at ../hw/core/loader.c:1147
+#3Â  0x000055aa1cfd788d in acpi_add_rom_blob
+Â Â Â Â  (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
+#4Â  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+#5Â  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
+at ../hw/i386/pc.c:638
+#6Â  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+#7Â  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
+../hw/core/machine.c:1749
+#8Â  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2779
+#9Â  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2807
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
+../system/vl.c:3838
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.Â  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.Â  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+I'll look into adding an option, but there may be too many false positives,
+such as the qxl_set_mode case above.Â  And the maintainers may object to me
+eliminating the false positives by adding more CPR_IN tests, due to gratuitous
+(from their POV) ugliness.
+
+But I will use the technique to look for more write violations.
+Andrey
+No way. ACPI with the source must be used in the same way as BIOSes
+and optional ROMs.
+Yup, its a bug.Â  Will fix.
+
+- Steve
+
diff --git a/classification_output/01/mistranslation/3886413 b/classification_output/01/mistranslation/3886413
deleted file mode 100644
index 5f79c452..00000000
--- a/classification_output/01/mistranslation/3886413
+++ /dev/null
@@ -1,33 +0,0 @@
-mistranslation: 0.637
-instruction: 0.555
-other: 0.535
-semantic: 0.487
-
-[Qemu-devel] [BUG] vhost-user: hot-unplug vhost-user nic for windows guest OS will fail with 100% reproduce rate
-
-Hi, guys
-
-I met a problem when hot-unplug vhost-user nic for Windows 2008 rc2 sp1 64 
-(Guest OS)
-
-The xml of nic is as followed:
-<interface type='vhostuser'>
-  <mac address='52:54:00:3b:83:aa'/>
-  <source type='unix' path='/var/run/vhost-user/port1' mode='client'/>
-  <target dev='port1'/>
-  <model type='virtio'/>
-  <driver queues='4'/>
-  <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
-</interface>
-
-Firstly, I use virsh attach-device win2008 vif.xml to hot-plug a nic for Guest 
-OS. This operation returns success.
-After guest OS discover nic successfully, I use virsh detach-device win2008 
-vif.xml to hot-unplug it. This operation will fail with 100% reproduce rate.
-
-However, if I hot-plug and hot-unplug virtio-net nic , it will not fail.
-
-I have analysis the process of qmp_device_del , I found that qemu have inject 
-interrupt to acpi to let it notice guest OS to remove nic.
-I guess there is something wrong in Windows when handle the interrupt.
-
diff --git a/classification_output/01/mistranslation/4158985 b/classification_output/01/mistranslation/4158985
deleted file mode 100644
index 798c2e86..00000000
--- a/classification_output/01/mistranslation/4158985
+++ /dev/null
@@ -1,1480 +0,0 @@
-mistranslation: 0.922
-other: 0.898
-semantic: 0.890
-instruction: 0.877
-
-[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
-
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
-    config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
-the autogenerated virtio-net-ccw device is present) works. Specifying
-several "-device virtio-net-pci" works as well.
-
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
-works (in-between state does not compile).
-
-This is reproducible with tcg as well. Same problem both with
---enable-vhost-vdpa and --disable-vhost-vdpa.
-
-Have not yet tried to figure out what might be special with
-virtio-ccw... anyone have an idea?
-
-[This should probably be considered a blocker?]
-
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-When I start qemu with a second virtio-net-ccw device (i.e. adding
->
--device virtio-net-ccw in addition to the autogenerated device), I get
->
-a segfault. gdb points to
->
->
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-config=0x55d6ad9e3f80 "RT") at
->
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
-(backtrace doesn't go further)
->
->
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-the autogenerated virtio-net-ccw device is present) works. Specifying
->
-several "-device virtio-net-pci" works as well.
->
->
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-works (in-between state does not compile).
-Ouch. I didn't test all in-between states :(
-But I wish we had a 0-day instrastructure like kernel has,
-that catches things like that.
-
->
-This is reproducible with tcg as well. Same problem both with
->
---enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
-Have not yet tried to figure out what might be special with
->
-virtio-ccw... anyone have an idea?
->
->
-[This should probably be considered a blocker?]
-
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> a segfault. gdb points to
->
->
->
-> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->     config=0x55d6ad9e3f80 "RT") at
->
-> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> 146     if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
->
-> (backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-
->
->
->
-> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> several "-device virtio-net-pci" works as well.
->
->
->
-> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> works (in-between state does not compile).
->
->
-Ouch. I didn't test all in-between states :(
->
-But I wish we had a 0-day instrastructure like kernel has,
->
-that catches things like that.
-Yep, that would be useful... so patchew only builds the complete series?
-
->
->
-> This is reproducible with tcg as well. Same problem both with
->
-> --enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
->
-> Have not yet tried to figure out what might be special with
->
-> virtio-ccw... anyone have an idea?
->
->
->
-> [This should probably be considered a blocker?]
-I think so, as it makes s390x unusable with more that one
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-On Fri, 24 Jul 2020 09:30:58 -0400
->
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
-> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > a segfault. gdb points to
->
-> >
->
-> > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> >     config=0x55d6ad9e3f80 "RT") at
->
-> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > 146           if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> >
->
-> > (backtrace doesn't go further)
->
->
-The core was incomplete, but running under gdb directly shows that it
->
-is just a bog-standard config space access (first for that device).
->
->
-The cause of the crash is that nc->peer is not set... no idea how that
->
-can happen, not that familiar with that part of QEMU. (Should the code
->
-check, or is that really something that should not happen?)
->
->
-What I don't understand is why it is set correctly for the first,
->
-autogenerated virtio-net-ccw device, but not for the second one, and
->
-why virtio-net-pci doesn't show these problems. The only difference
->
-between -ccw and -pci that comes to my mind here is that config space
->
-accesses for ccw are done via an asynchronous operation, so timing
->
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-
->
-> >
->
-> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > several "-device virtio-net-pci" works as well.
->
-> >
->
-> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > works (in-between state does not compile).
->
->
->
-> Ouch. I didn't test all in-between states :(
->
-> But I wish we had a 0-day instrastructure like kernel has,
->
-> that catches things like that.
->
->
-Yep, that would be useful... so patchew only builds the complete series?
->
->
->
->
-> > This is reproducible with tcg as well. Same problem both with
->
-> > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> >
->
-> > Have not yet tried to figure out what might be special with
->
-> > virtio-ccw... anyone have an idea?
->
-> >
->
-> > [This should probably be considered a blocker?]
->
->
-I think so, as it makes s390x unusable with more that one
->
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 09:30:58 -0400
->
-> "Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
->
-> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > > a segfault. gdb points to
->
-> > >
->
-> > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> > >     config=0x55d6ad9e3f80 "RT") at
->
-> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > >
->
-> > > (backtrace doesn't go further)
->
->
->
-> The core was incomplete, but running under gdb directly shows that it
->
-> is just a bog-standard config space access (first for that device).
->
->
->
-> The cause of the crash is that nc->peer is not set... no idea how that
->
-> can happen, not that familiar with that part of QEMU. (Should the code
->
-> check, or is that really something that should not happen?)
->
->
->
-> What I don't understand is why it is set correctly for the first,
->
-> autogenerated virtio-net-ccw device, but not for the second one, and
->
-> why virtio-net-pci doesn't show these problems. The only difference
->
-> between -ccw and -pci that comes to my mind here is that config space
->
-> accesses for ccw are done via an asynchronous operation, so timing
->
-> might be different.
->
->
-Hopefully Jason has an idea. Could you post a full command line
->
-please? Do you need a working guest to trigger this? Does this trigger
->
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on 
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
- 
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-
->
->
-> > >
->
-> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > > several "-device virtio-net-pci" works as well.
->
-> > >
->
-> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > > works (in-between state does not compile).
->
-> >
->
-> > Ouch. I didn't test all in-between states :(
->
-> > But I wish we had a 0-day instrastructure like kernel has,
->
-> > that catches things like that.
->
->
->
-> Yep, that would be useful... so patchew only builds the complete series?
->
->
->
-> >
->
-> > > This is reproducible with tcg as well. Same problem both with
->
-> > > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> > >
->
-> > > Have not yet tried to figure out what might be special with
->
-> > > virtio-ccw... anyone have an idea?
->
-> > >
->
-> > > [This should probably be considered a blocker?]
->
->
->
-> I think so, as it makes s390x unusable with more that one
->
-> virtio-net-ccw device, and I don't even see a workaround.
->
-
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-     config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-
-Thanks
-0001-virtio-net-check-the-existence-of-peer-before-accesi.patch
-Description:
-Text Data
-
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 11:17:57 -0400
->
-> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->
->
->> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>> a segfault. gdb points to
->
->>>>>
->
->>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>      config=0x55d6ad9e3f80 "RT") at
->
->>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>> 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>
->
->>>>> (backtrace doesn't go further)
->
->>> The core was incomplete, but running under gdb directly shows that it
->
->>> is just a bog-standard config space access (first for that device).
->
->>>
->
->>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>> check, or is that really something that should not happen?)
->
->>>
->
->>> What I don't understand is why it is set correctly for the first,
->
->>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>> why virtio-net-pci doesn't show these problems. The only difference
->
->>> between -ccw and -pci that comes to my mind here is that config space
->
->>> accesses for ccw are done via an asynchronous operation, so timing
->
->>> might be different.
->
->> Hopefully Jason has an idea. Could you post a full command line
->
->> please? Do you need a working guest to trigger this? Does this trigger
->
->> on an x86 host?
->
-> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->
->
-> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> qemu,zpci=on
->
-> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> -device
->
-> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> -device virtio-net-ccw
->
->
->
-> It seems it needs the guest actually doing something with the nics; I
->
-> cannot reproduce the crash if I use the old advent calendar moon buggy
->
-> image and just add a virtio-net-ccw device.
->
->
->
-> (I don't think it's a problem with my local build, as I see the problem
->
-> both on my laptop and on an LPAR.)
->
->
->
-It looks to me we forget the check the existence of peer.
->
->
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-      config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Thanks
-
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
->
-> On Sat, 25 Jul 2020 08:40:07 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
->> On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 11:17:57 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>>>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>>>
->
->>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>>>> a segfault. gdb points to
->
->>>>>>>
->
->>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>>>       config=0x55d6ad9e3f80 "RT") at
->
->>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>>>> 146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>>>
->
->>>>>>> (backtrace doesn't go further)
->
->>>>> The core was incomplete, but running under gdb directly shows that it
->
->>>>> is just a bog-standard config space access (first for that device).
->
->>>>>
->
->>>>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>>>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>>>> check, or is that really something that should not happen?)
->
->>>>>
->
->>>>> What I don't understand is why it is set correctly for the first,
->
->>>>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>>>> why virtio-net-pci doesn't show these problems. The only difference
->
->>>>> between -ccw and -pci that comes to my mind here is that config space
->
->>>>> accesses for ccw are done via an asynchronous operation, so timing
->
->>>>> might be different.
->
->>>> Hopefully Jason has an idea. Could you post a full command line
->
->>>> please? Do you need a working guest to trigger this? Does this trigger
->
->>>> on an x86 host?
->
->>> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->>>
->
->>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
->>> qemu,zpci=on
->
->>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
->>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
->>> -device
->
->>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
->>> -device virtio-net-ccw
->
->>>
->
->>> It seems it needs the guest actually doing something with the nics; I
->
->>> cannot reproduce the crash if I use the old advent calendar moon buggy
->
->>> image and just add a virtio-net-ccw device.
->
->>>
->
->>> (I don't think it's a problem with my local build, as I see the problem
->
->>> both on my laptop and on an LPAR.)
->
->>
->
->> It looks to me we forget the check the existence of peer.
->
->>
->
->> Please try the attached patch to see if it works.
->
-> Thanks, that patch gets my guest up and running again. So, FWIW,
->
->
->
-> Tested-by: Cornelia Huck <cohuck@redhat.com>
->
->
->
-> Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> virtio-net-ccw device)?
->
->
->
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-
->
->
-For autogenerated virtio-net-cww, I think the reason is that it has
->
-already had a peer set.
-Ok, that might well be.
-
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-       config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need
-start without peer, and you need a real guest (any Linux) that is trying
-to access the config space of virtio-net.
-Thanks
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Ok, that might well be.
-
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
->
-> On Mon, 27 Jul 2020 15:38:12 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
-> > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
->
-> > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > Jason Wang <jasowang@redhat.com> wrote:
->
-> > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
-> > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e.
->
-> > > > > > > > > adding
->
-> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > device), I get
->
-> > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > >
->
-> > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > >        config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > 146     if (nc->peer->info->type ==
->
-> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > >
->
-> > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > The core was incomplete, but running under gdb directly shows
->
-> > > > > > > that it
->
-> > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > device).
->
-> > > > > > >
->
-> > > > > > > The cause of the crash is that nc->peer is not set... no idea
->
-> > > > > > > how that
->
-> > > > > > > can happen, not that familiar with that part of QEMU. (Should
->
-> > > > > > > the code
->
-> > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > >
->
-> > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > first,
->
-> > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > one, and
->
-> > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > difference
->
-> > > > > > > between -ccw and -pci that comes to my mind here is that config
->
-> > > > > > > space
->
-> > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > timing
->
-> > > > > > > might be different.
->
-> > > > > > Hopefully Jason has an idea. Could you post a full command line
->
-> > > > > > please? Do you need a working guest to trigger this? Does this
->
-> > > > > > trigger
->
-> > > > > > on an x86 host?
->
-> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > >
->
-> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> > > > > qemu,zpci=on
->
-> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > -device
->
-> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > -device virtio-net-ccw
->
-> > > > >
->
-> > > > > It seems it needs the guest actually doing something with the nics;
->
-> > > > > I
->
-> > > > > cannot reproduce the crash if I use the old advent calendar moon
->
-> > > > > buggy
->
-> > > > > image and just add a virtio-net-ccw device.
->
-> > > > >
->
-> > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > problem
->
-> > > > > both on my laptop and on an LPAR.)
->
-> > > > It looks to me we forget the check the existence of peer.
->
-> > > >
->
-> > > > Please try the attached patch to see if it works.
->
-> > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > >
->
-> > > Tested-by: Cornelia Huck <cohuck@redhat.com>
->
-> > >
->
-> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> > > virtio-net-ccw device)?
->
-> >
->
-> > It can be hit with virtio-net-pci as well (just start without peer).
->
-> Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> virtio-net-pci. But checking seems to be the right idea anyway.
->
->
->
-Sorry for being unclear, I meant for networking part, you just need start
->
-without peer, and you need a real guest (any Linux) that is trying to access
->
-the config space of virtio-net.
->
->
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-
->
->
->
->
-> > For autogenerated virtio-net-cww, I think the reason is that it has
->
-> > already had a peer set.
->
-> Ok, that might well be.
->
->
->
->
-
-On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-        config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux
-guest without a peer).
-Thanks
-
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
->
-> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
-> > On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
->
-> > > On Mon, 27 Jul 2020 15:38:12 +0800
->
-> > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > >
->
-> > > > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
->
-> > > > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > > > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck
->
-> > > > > > > > > > wrote:
->
-> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device
->
-> > > > > > > > > > > (i.e. adding
->
-> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > > > device), I get
->
-> > > > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > > > >         config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > > > 146         if (nc->peer->info->type ==
->
-> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > > > The core was incomplete, but running under gdb directly
->
-> > > > > > > > > shows that it
->
-> > > > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > > > device).
->
-> > > > > > > > >
->
-> > > > > > > > > The cause of the crash is that nc->peer is not set... no
->
-> > > > > > > > > idea how that
->
-> > > > > > > > > can happen, not that familiar with that part of QEMU.
->
-> > > > > > > > > (Should the code
->
-> > > > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > > > >
->
-> > > > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > > > first,
->
-> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > > > one, and
->
-> > > > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > > > difference
->
-> > > > > > > > > between -ccw and -pci that comes to my mind here is that
->
-> > > > > > > > > config space
->
-> > > > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > > > timing
->
-> > > > > > > > > might be different.
->
-> > > > > > > > Hopefully Jason has an idea. Could you post a full command
->
-> > > > > > > > line
->
-> > > > > > > > please? Do you need a working guest to trigger this? Does
->
-> > > > > > > > this trigger
->
-> > > > > > > > on an x86 host?
->
-> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > > > >
->
-> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg
->
-> > > > > > > -cpu qemu,zpci=on
->
-> > > > > > > -m 1024 -nographic -device
->
-> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > > > -drive
->
-> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > > > -device
->
-> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > > > -device virtio-net-ccw
->
-> > > > > > >
->
-> > > > > > > It seems it needs the guest actually doing something with the
->
-> > > > > > > nics; I
->
-> > > > > > > cannot reproduce the crash if I use the old advent calendar
->
-> > > > > > > moon buggy
->
-> > > > > > > image and just add a virtio-net-ccw device.
->
-> > > > > > >
->
-> > > > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > > > problem
->
-> > > > > > > both on my laptop and on an LPAR.)
->
-> > > > > > It looks to me we forget the check the existence of peer.
->
-> > > > > >
->
-> > > > > > Please try the attached patch to see if it works.
->
-> > > > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > > > >
->
-> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com>
->
-> > > > >
->
-> > > > > Any idea why this did not hit with virtio-net-pci (or the
->
-> > > > > autogenerated
->
-> > > > > virtio-net-ccw device)?
->
-> > > > It can be hit with virtio-net-pci as well (just start without peer).
->
-> > > Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> > > virtio-net-pci. But checking seems to be the right idea anyway.
->
-> > Sorry for being unclear, I meant for networking part, you just need start
->
-> > without peer, and you need a real guest (any Linux) that is trying to
->
-> > access
->
-> > the config space of virtio-net.
->
-> >
->
-> > Thanks
->
-> A pxe guest will do it, but that doesn't support ccw, right?
->
->
->
-Yes, it depends on the cli actually.
->
->
->
->
->
-> I'm still unclear why this triggers with ccw but not pci -
->
-> any idea?
->
->
->
-I don't test pxe but I can reproduce this with pci (just start a linux guest
->
-without a peer).
->
->
-Thanks
->
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-
--- 
-MST
-
-On 2020/7/27 ä¸å9:16, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
-On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-         config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux guest
-without a peer).
-
-Thanks
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-That should work or we can simply extend the exist virtio-net qtest to
-do that.
-Thanks
-
diff --git a/classification_output/01/mistranslation/4412535 b/classification_output/01/mistranslation/4412535
deleted file mode 100644
index 97712c2f..00000000
--- a/classification_output/01/mistranslation/4412535
+++ /dev/null
@@ -1,348 +0,0 @@
-mistranslation: 0.800
-other: 0.786
-instruction: 0.751
-semantic: 0.737
-
-[BUG] accel/tcg: cpu_exec_longjmp_cleanup: assertion failed: (cpu == current_cpu)
-
-It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 
-code.
-
-This code: 
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <pthread.h>
-#include <signal.h>
-#include <unistd.h>
-
-pthread_t thread1, thread2;
-
-// Signal handler for SIGALRM
-void alarm_handler(int sig) {
-    // Do nothing, just wake up the other thread
-}
-
-// Thread 1 function
-void* thread1_func(void* arg) {
-    // Set up the signal handler for SIGALRM
-    signal(SIGALRM, alarm_handler);
-
-    // Wait for 5 seconds
-    sleep(1);
-
-    // Send SIGALRM signal to thread 2
-    pthread_kill(thread2, SIGALRM);
-
-    return NULL;
-}
-
-// Thread 2 function
-void* thread2_func(void* arg) {
-    // Wait for the SIGALRM signal
-    pause();
-
-    printf("Thread 2 woke up!\n");
-
-    return NULL;
-}
-
-int main() {
-    // Create thread 1
-    if (pthread_create(&thread1, NULL, thread1_func, NULL) != 0) {
-        fprintf(stderr, "Failed to create thread 1\n");
-        return 1;
-    }
-
-    // Create thread 2
-    if (pthread_create(&thread2, NULL, thread2_func, NULL) != 0) {
-        fprintf(stderr, "Failed to create thread 2\n");
-        return 1;
-    }
-
-    // Wait for both threads to finish
-    pthread_join(thread1, NULL);
-    pthread_join(thread2, NULL);
-
-    return 0;
-}
-
-
-Fails with this -strace log (there are also unsupported syscalls 334 and 435, 
-but it seems it doesn't affect the code much):
-
-...
-736 rt_sigaction(SIGALRM,0x000000001123ec20,0x000000001123ecc0) = 0
-736 clock_nanosleep(CLOCK_REALTIME,0,{tv_sec = 1,tv_nsec = 0},{tv_sec = 
-1,tv_nsec = 0})
-736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x0000000010800b38,8) = 0
-736 Unknown syscall 435
-736 
-clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|
- ...
-736 rt_sigprocmask(SIG_SETMASK,0x0000000010800b38,NULL,8)
-736 set_robust_list(0x11a419a0,0) = -1 errno=38 (Function not implemented)
-736 rt_sigprocmask(SIG_SETMASK,0x0000000011a41fb0,NULL,8) = 0
- = 0
-736 pause(0,0,2,277186368,0,295966400)
-736 
-futex(0x000000001123f990,FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET,738,NULL,NULL,0)
- = 0
-736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x000000001123ee88,8) = 0
-736 getpid() = 736
-736 tgkill(736,739,SIGALRM) = 0
- = -1 errno=4 (Interrupted system call)
---- SIGALRM {si_signo=SIGALRM, si_code=SI_TKILL, si_pid=736, si_uid=0} ---
-0x48874a != 0x3c69e10
-736 rt_sigprocmask(SIG_SETMASK,0x000000001123ee88,NULL,8) = 0
-**
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-0x48874a != 0x3c69e10
-**
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-# 
-
-The code fails either with or without -singlestep, the command line:
-
-/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep  /opt/x86_64/alarm.bin
-
-Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't 
-use RDTSC on i486" [1], 
-with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints 
-current pointers of 
-cpu and current_cpu (line "0x48874a != 0x3c69e10").
-
-config.log (built as a part of buildroot, basically the minimal possible 
-configuration for running x86_64 on 486):
-
-# Configured with: 
-'/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/build/qemu-8.1.1/configure'
- '--prefix=/usr' 
-'--cross-prefix=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/i486-buildroot-linux-gnu-'
- '--audio-drv-list=' 
-'--python=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/python3'
- 
-'--ninja=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/ninja' 
-'--disable-alsa' '--disable-bpf' '--disable-brlapi' '--disable-bsd-user' 
-'--disable-cap-ng' '--disable-capstone' '--disable-containers' 
-'--disable-coreaudio' '--disable-curl' '--disable-curses' 
-'--disable-dbus-display' '--disable-docs' '--disable-dsound' '--disable-hvf' 
-'--disable-jack' '--disable-libiscsi' '--disable-linux-aio' 
-'--disable-linux-io-uring' '--disable-malloc-trim' '--disable-membarrier' 
-'--disable-mpath' '--disable-netmap' '--disable-opengl' '--disable-oss' 
-'--disable-pa' '--disable-rbd' '--disable-sanitizers' '--disable-selinux' 
-'--disable-sparse' '--disable-strip' '--disable-vde' '--disable-vhost-crypto' 
-'--disable-vhost-user-blk-server' '--disable-virtfs' '--disable-whpx' 
-'--disable-xen' '--disable-attr' '--disable-kvm' '--disable-vhost-net' 
-'--disable-download' '--disable-hexagon-idef-parser' '--disable-system' 
-'--enable-linux-user' '--target-list=x86_64-linux-user' '--disable-vhost-user' 
-'--disable-slirp' '--disable-sdl' '--disable-fdt' '--enable-trace-backends=nop' 
-'--disable-tools' '--disable-guest-agent' '--disable-fuse' 
-'--disable-fuse-lseek' '--disable-seccomp' '--disable-libssh' 
-'--disable-libusb' '--disable-vnc' '--disable-nettle' '--disable-numa' 
-'--disable-pipewire' '--disable-spice' '--disable-usb-redir' 
-'--disable-install-blobs'
-
-Emulation of the same x86_64 code with qemu 6.2.0 installed on another x86_64 
-native machine works fine.
-
-[1]
-https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05387.html
-Best regards,
-Petr
-
-On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote:
->
->
-It seems there is a bug in SIGALRM handling when 486 system emulates x86_64
->
-code.
-486 host is pretty well out of support currently. Can you reproduce
-this on a less ancient host CPU type ?
-
->
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed:
->
-(cpu == current_cpu)
->
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-assertion failed: (cpu == current_cpu)
->
-0x48874a != 0x3c69e10
->
-**
->
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed:
->
-(cpu == current_cpu)
->
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-assertion failed: (cpu == current_cpu)
-What compiler version do you build QEMU with? That
-assert is there because we have seen some buggy compilers
-in the past which don't correctly preserve the variable
-value as the setjmp/longjmp spec requires them to.
-
-thanks
--- PMM
-
-Dne 27. 11. 23 v 10:37 Peter Maydell napsal(a):
->
-On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote:
->
->
->
-> It seems there is a bug in SIGALRM handling when 486 system emulates x86_64
->
-> code.
->
->
-486 host is pretty well out of support currently. Can you reproduce
->
-this on a less ancient host CPU type ?
->
-It seems it only fails when the code is compiled for i486. QEMU built with the 
-same compiler with -march=i586 and above runs on the same physical hardware 
-without a problem. All -march= variants were executed on ryzen 3600.
-
->
-> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
->
-> failed: (cpu == current_cpu)
->
-> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-> assertion failed: (cpu == current_cpu)
->
-> 0x48874a != 0x3c69e10
->
-> **
->
-> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
->
-> failed: (cpu == current_cpu)
->
-> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-> assertion failed: (cpu == current_cpu)
->
->
-What compiler version do you build QEMU with? That
->
-assert is there because we have seen some buggy compilers
->
-in the past which don't correctly preserve the variable
->
-value as the setjmp/longjmp spec requires them to.
->
-i486 and i586+ code variants were compiled with GCC 13.2.0 (more exactly, 
-slackware64 current multilib distribution).
-
-i486 binary which runs on the real 486 is also GCC 13.2.0 and installed as a 
-part of the buildroot crosscompiler (about two week old git snapshot).
-
->
-thanks
->
--- PMM
-best regards,
-Petr
-
-On 11/25/23 07:08, Petr Cvek wrote:
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-#
-
-The code fails either with or without -singlestep, the command line:
-
-/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep  /opt/x86_64/alarm.bin
-
-Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't use 
-RDTSC on i486" [1],
-with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints 
-current pointers of
-cpu and current_cpu (line "0x48874a != 0x3c69e10").
-If you try this again with 8.2-rc2, you should not see an assertion failure.
-You should see instead
-
-QEMU internal SIGILL {code=ILLOPC, addr=0x12345678}
-which I think more accurately summarizes the situation of attempting RDTSC on hardware
-that does not support it.
-r~
-
-Dne 29. 11. 23 v 15:25 Richard Henderson napsal(a):
->
-On 11/25/23 07:08, Petr Cvek wrote:
->
-> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
->
-> failed: (cpu == current_cpu)
->
-> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-> assertion failed: (cpu == current_cpu)
->
-> #
->
->
->
-> The code fails either with or without -singlestep, the command line:
->
->
->
-> /usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestepÂ
->
-> /opt/x86_64/alarm.bin
->
->
->
-> Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't
->
-> use RDTSC on i486" [1],
->
-> with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now
->
-> prints current pointers of
->
-> cpu and current_cpu (line "0x48874a != 0x3c69e10").
->
->
->
-If you try this again with 8.2-rc2, you should not see an assertion failure.
->
-You should see instead
->
->
-QEMU internal SIGILL {code=ILLOPC, addr=0x12345678}
->
->
-which I think more accurately summarizes the situation of attempting RDTSC on
->
-hardware that does not support it.
->
->
-Compilation of vanilla qemu v8.2.0-rc2 with -march=i486 by GCC 13.2.0 and 
-running the resulting binary on ryzen still leads to:
-
-**
-ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-Aborted
-
->
->
-r~
-Petr
-
diff --git a/classification_output/01/mistranslation/5373318 b/classification_output/01/mistranslation/5373318
deleted file mode 100644
index e4d4789c..00000000
--- a/classification_output/01/mistranslation/5373318
+++ /dev/null
@@ -1,692 +0,0 @@
-mistranslation: 0.881
-other: 0.839
-instruction: 0.755
-semantic: 0.752
-
-[Qemu-devel] [BUG?] aio_get_linux_aio: Assertion `ctx->linux_aio' failed
-
-Hi,
-
-I am seeing some strange QEMU assertion failures for qemu on s390x,
-which prevents a guest from starting.
-
-Git bisecting points to the following commit as the source of the error.
-
-commit ed6e2161715c527330f936d44af4c547f25f687e
-Author: Nishanth Aravamudan <address@hidden>
-Date:   Fri Jun 22 12:37:00 2018 -0700
-
-    linux-aio: properly bubble up errors from initialization
-
-    laio_init() can fail for a couple of reasons, which will lead to a NULL
-    pointer dereference in laio_attach_aio_context().
-
-    To solve this, add a aio_setup_linux_aio() function which is called
-    early in raw_open_common. If this fails, propagate the error up. The
-    signature of aio_get_linux_aio() was not modified, because it seems
-    preferable to return the actual errno from the possible failing
-    initialization calls.
-
-    Additionally, when the AioContext changes, we need to associate a
-    LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
-    callback and call the new aio_setup_linux_aio(), which will allocate a
-new AioContext if needed, and return errors on failures. If it
-fails for
-any reason, fallback to threaded AIO with an error message, as the
-    device is already in-use by the guest.
-
-    Add an assert that aio_get_linux_aio() cannot return NULL.
-
-    Signed-off-by: Nishanth Aravamudan <address@hidden>
-    Message-id: address@hidden
-    Signed-off-by: Stefan Hajnoczi <address@hidden>
-Not sure what is causing this assertion to fail. Here is the qemu
-command line of the guest, from qemu log, which throws this error:
-LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
-QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name
-guest=rt_vm1,debug-threads=on -S -object
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m
-1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
-iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d
--display none -no-user-config -nodefaults -chardev
-socket,id=charmonitor,fd=28,server,nowait -mon
-chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
--boot strict=on -drive
-file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
--netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000
--netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
-virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002
--chardev pty,id=charconsole0 -device
-sclpconsole,chardev=charconsole0,id=console0 -device
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox
-on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
--msg timestamp=on
-2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges
-2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev
-pty,id=charconsole0: char device redirected to /dev/pts/3 (label
-charconsole0)
-qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion
-`ctx->linux_aio' failed.
-2018-07-17 15:48:43.309+0000: shutting down, reason=failed
-
-
-Any help debugging this would be greatly appreciated.
-
-Thank you
-Farhan
-
-On 17.07.2018 [13:25:53 -0400], Farhan Ali wrote:
->
-Hi,
->
->
-I am seeing some strange QEMU assertion failures for qemu on s390x,
->
-which prevents a guest from starting.
->
->
-Git bisecting points to the following commit as the source of the error.
->
->
-commit ed6e2161715c527330f936d44af4c547f25f687e
->
-Author: Nishanth Aravamudan <address@hidden>
->
-Date:   Fri Jun 22 12:37:00 2018 -0700
->
->
-linux-aio: properly bubble up errors from initialization
->
->
-laio_init() can fail for a couple of reasons, which will lead to a NULL
->
-pointer dereference in laio_attach_aio_context().
->
->
-To solve this, add a aio_setup_linux_aio() function which is called
->
-early in raw_open_common. If this fails, propagate the error up. The
->
-signature of aio_get_linux_aio() was not modified, because it seems
->
-preferable to return the actual errno from the possible failing
->
-initialization calls.
->
->
-Additionally, when the AioContext changes, we need to associate a
->
-LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
->
-callback and call the new aio_setup_linux_aio(), which will allocate a
->
-new AioContext if needed, and return errors on failures. If it fails for
->
-any reason, fallback to threaded AIO with an error message, as the
->
-device is already in-use by the guest.
->
->
-Add an assert that aio_get_linux_aio() cannot return NULL.
->
->
-Signed-off-by: Nishanth Aravamudan <address@hidden>
->
-Message-id: address@hidden
->
-Signed-off-by: Stefan Hajnoczi <address@hidden>
->
->
->
-Not sure what is causing this assertion to fail. Here is the qemu command
->
-line of the guest, from qemu log, which throws this error:
->
->
->
-LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
->
-QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name
->
-guest=rt_vm1,debug-threads=on -S -object
->
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes
->
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m 1024
->
--realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
->
-iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d -display
->
-none -no-user-config -nodefaults -chardev
->
-socket,id=charmonitor,fd=28,server,nowait -mon
->
-chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot
->
-strict=on -drive
->
-file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
->
--device
->
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
->
--netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
->
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000
->
--netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
->
-virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002
->
--chardev pty,id=charconsole0 -device
->
-sclpconsole,chardev=charconsole0,id=console0 -device
->
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox
->
-on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg
->
-timestamp=on
->
->
->
->
-2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges
->
-2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev pty,id=charconsole0:
->
-char device redirected to /dev/pts/3 (label charconsole0)
->
-qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion
->
-`ctx->linux_aio' failed.
->
-2018-07-17 15:48:43.309+0000: shutting down, reason=failed
->
->
->
-Any help debugging this would be greatly appreciated.
-iiuc, this possibly implies AIO was not actually used previously on this
-guest (it might have silently been falling back to threaded IO?). I
-don't have access to s390x, but would it be possible to run qemu under
-gdb and see if aio_setup_linux_aio is being called at all (I think it
-might not be, but I'm not sure why), and if so, if it's for the context
-in question?
-
-If it's not being called first, could you see what callpath is calling
-aio_get_linux_aio when this assertion trips?
-
-Thanks!
--Nish
-
-On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
-iiuc, this possibly implies AIO was not actually used previously on this
-guest (it might have silently been falling back to threaded IO?). I
-don't have access to s390x, but would it be possible to run qemu under
-gdb and see if aio_setup_linux_aio is being called at all (I think it
-might not be, but I'm not sure why), and if so, if it's for the context
-in question?
-
-If it's not being called first, could you see what callpath is calling
-aio_get_linux_aio when this assertion trips?
-
-Thanks!
--Nish
-Hi Nishant,
-From the coredump of the guest this is the call trace that calls
-aio_get_linux_aio:
-Stack trace of thread 145158:
-#0  0x000003ff94dbe274 raise (libc.so.6)
-#1  0x000003ff94da39a8 abort (libc.so.6)
-#2  0x000003ff94db62ce __assert_fail_base (libc.so.6)
-#3  0x000003ff94db634c __assert_fail (libc.so.6)
-#4  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
-#5  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
-#6  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
-#7  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
-#8  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
-#9  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
-#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
-#11 0x000003ff94f879a8 start_thread (libpthread.so.0)
-#12 0x000003ff94e797ee thread_start (libc.so.6)
-
-
-Thanks for taking a look and responding.
-
-Thanks
-Farhan
-
-On 07/18/2018 09:42 AM, Farhan Ali wrote:
-On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
-iiuc, this possibly implies AIO was not actually used previously on this
-guest (it might have silently been falling back to threaded IO?). I
-don't have access to s390x, but would it be possible to run qemu under
-gdb and see if aio_setup_linux_aio is being called at all (I think it
-might not be, but I'm not sure why), and if so, if it's for the context
-in question?
-
-If it's not being called first, could you see what callpath is calling
-aio_get_linux_aio when this assertion trips?
-
-Thanks!
--Nish
-Hi Nishant,
-From the coredump of the guest this is the call trace that calls
-aio_get_linux_aio:
-Stack trace of thread 145158:
-#0Â  0x000003ff94dbe274 raise (libc.so.6)
-#1Â  0x000003ff94da39a8 abort (libc.so.6)
-#2Â  0x000003ff94db62ce __assert_fail_base (libc.so.6)
-#3Â  0x000003ff94db634c __assert_fail (libc.so.6)
-#4Â  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
-#5Â  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
-#6Â  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
-#7Â  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
-#8Â  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
-#9Â  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
-#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
-#11 0x000003ff94f879a8 start_thread (libpthread.so.0)
-#12 0x000003ff94e797ee thread_start (libc.so.6)
-
-
-Thanks for taking a look and responding.
-
-Thanks
-Farhan
-Trying to debug a little further, the block device in this case is a
-"host device". And looking at your commit carefully you use the
-bdrv_attach_aio_context callback to setup a Linux AioContext.
-For some reason the "host device" struct (BlockDriver bdrv_host_device
-in block/file-posix.c) does not have a bdrv_attach_aio_context defined.
-So a simple change of adding the callback to the struct solves the issue
-and the guest starts fine.
-diff --git a/block/file-posix.c b/block/file-posix.c
-index 28824aa..b8d59fb 100644
---- a/block/file-posix.c
-+++ b/block/file-posix.c
-@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
-     .bdrv_refresh_limits = raw_refresh_limits,
-     .bdrv_io_plug = raw_aio_plug,
-     .bdrv_io_unplug = raw_aio_unplug,
-+    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
-
-     .bdrv_co_truncate       = raw_co_truncate,
-     .bdrv_getlength    = raw_getlength,
-I am not too familiar with block device code in QEMU, so not sure if
-this is the right fix or if there are some underlying problems.
-Thanks
-Farhan
-
-On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
->
->
->
-On 07/18/2018 09:42 AM, Farhan Ali wrote:
->
->
->
->
->
-> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
->
-> > iiuc, this possibly implies AIO was not actually used previously on this
->
-> > guest (it might have silently been falling back to threaded IO?). I
->
-> > don't have access to s390x, but would it be possible to run qemu under
->
-> > gdb and see if aio_setup_linux_aio is being called at all (I think it
->
-> > might not be, but I'm not sure why), and if so, if it's for the context
->
-> > in question?
->
-> >
->
-> > If it's not being called first, could you see what callpath is calling
->
-> > aio_get_linux_aio when this assertion trips?
->
-> >
->
-> > Thanks!
->
-> > -Nish
->
->
->
->
->
-> Hi Nishant,
->
->
->
->  From the coredump of the guest this is the call trace that calls
->
-> aio_get_linux_aio:
->
->
->
->
->
-> Stack trace of thread 145158:
->
-> #0Â  0x000003ff94dbe274 raise (libc.so.6)
->
-> #1Â  0x000003ff94da39a8 abort (libc.so.6)
->
-> #2Â  0x000003ff94db62ce __assert_fail_base (libc.so.6)
->
-> #3Â  0x000003ff94db634c __assert_fail (libc.so.6)
->
-> #4Â  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
->
-> #5Â  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
->
-> #6Â  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
->
-> #7Â  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
->
-> #8Â  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
->
-> #9Â  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
->
-> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
->
-> #11 0x000003ff94f879a8 start_thread (libpthread.so.0)
->
-> #12 0x000003ff94e797ee thread_start (libc.so.6)
->
->
->
->
->
-> Thanks for taking a look and responding.
->
->
->
-> Thanks
->
-> Farhan
->
->
->
->
->
->
->
->
-Trying to debug a little further, the block device in this case is a "host
->
-device". And looking at your commit carefully you use the
->
-bdrv_attach_aio_context callback to setup a Linux AioContext.
->
->
-For some reason the "host device" struct (BlockDriver bdrv_host_device in
->
-block/file-posix.c) does not have a bdrv_attach_aio_context defined.
->
-So a simple change of adding the callback to the struct solves the issue and
->
-the guest starts fine.
->
->
->
-diff --git a/block/file-posix.c b/block/file-posix.c
->
-index 28824aa..b8d59fb 100644
->
---- a/block/file-posix.c
->
-+++ b/block/file-posix.c
->
-@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
->
-.bdrv_refresh_limits = raw_refresh_limits,
->
-.bdrv_io_plug = raw_aio_plug,
->
-.bdrv_io_unplug = raw_aio_unplug,
->
-+    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
->
->
-.bdrv_co_truncate       = raw_co_truncate,
->
-.bdrv_getlength    = raw_getlength,
->
->
->
->
-I am not too familiar with block device code in QEMU, so not sure if
->
-this is the right fix or if there are some underlying problems.
-Oh this is quite embarassing! I only added the bdrv_attach_aio_context
-callback for the file-backed device. Your fix is definitely corect for
-host device. Let me make sure there weren't any others missed and I will
-send out a properly formatted patch. Thank you for the quick testing and
-turnaround!
-
--Nish
-
-On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote:
->
-On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
->
->
->
->
->
-> On 07/18/2018 09:42 AM, Farhan Ali wrote:
->
->>
->
->>
->
->> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
->
->>> iiuc, this possibly implies AIO was not actually used previously on this
->
->>> guest (it might have silently been falling back to threaded IO?). I
->
->>> don't have access to s390x, but would it be possible to run qemu under
->
->>> gdb and see if aio_setup_linux_aio is being called at all (I think it
->
->>> might not be, but I'm not sure why), and if so, if it's for the context
->
->>> in question?
->
->>>
->
->>> If it's not being called first, could you see what callpath is calling
->
->>> aio_get_linux_aio when this assertion trips?
->
->>>
->
->>> Thanks!
->
->>> -Nish
->
->>
->
->>
->
->> Hi Nishant,
->
->>
->
->>  From the coredump of the guest this is the call trace that calls
->
->> aio_get_linux_aio:
->
->>
->
->>
->
->> Stack trace of thread 145158:
->
->> #0Â  0x000003ff94dbe274 raise (libc.so.6)
->
->> #1Â  0x000003ff94da39a8 abort (libc.so.6)
->
->> #2Â  0x000003ff94db62ce __assert_fail_base (libc.so.6)
->
->> #3Â  0x000003ff94db634c __assert_fail (libc.so.6)
->
->> #4Â  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
->
->> #5Â  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
->
->> #6Â  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
->
->> #7Â  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
->
->> #8Â  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
->
->> #9Â  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
->
->> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
->
->> #11 0x000003ff94f879a8 start_thread (libpthread.so.0)
->
->> #12 0x000003ff94e797ee thread_start (libc.so.6)
->
->>
->
->>
->
->> Thanks for taking a look and responding.
->
->>
->
->> Thanks
->
->> Farhan
->
->>
->
->>
->
->>
->
->
->
-> Trying to debug a little further, the block device in this case is a "host
->
-> device". And looking at your commit carefully you use the
->
-> bdrv_attach_aio_context callback to setup a Linux AioContext.
->
->
->
-> For some reason the "host device" struct (BlockDriver bdrv_host_device in
->
-> block/file-posix.c) does not have a bdrv_attach_aio_context defined.
->
-> So a simple change of adding the callback to the struct solves the issue and
->
-> the guest starts fine.
->
->
->
->
->
-> diff --git a/block/file-posix.c b/block/file-posix.c
->
-> index 28824aa..b8d59fb 100644
->
-> --- a/block/file-posix.c
->
-> +++ b/block/file-posix.c
->
-> @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
->
->      .bdrv_refresh_limits = raw_refresh_limits,
->
->      .bdrv_io_plug = raw_aio_plug,
->
->      .bdrv_io_unplug = raw_aio_unplug,
->
-> +    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
->
->
->
->      .bdrv_co_truncate       = raw_co_truncate,
->
->      .bdrv_getlength    = raw_getlength,
->
->
->
->
->
->
->
-> I am not too familiar with block device code in QEMU, so not sure if
->
-> this is the right fix or if there are some underlying problems.
->
->
-Oh this is quite embarassing! I only added the bdrv_attach_aio_context
->
-callback for the file-backed device. Your fix is definitely corect for
->
-host device. Let me make sure there weren't any others missed and I will
->
-send out a properly formatted patch. Thank you for the quick testing and
->
-turnaround!
-Farhan, can you respin your patch with proper sign-off and patch description?
-Adding qemu-block.
-
-Hi Christian,
-
-On 19.07.2018 [08:55:20 +0200], Christian Borntraeger wrote:
->
->
->
-On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote:
->
-> On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
->
->>
->
->>
->
->> On 07/18/2018 09:42 AM, Farhan Ali wrote:
-<snip>
-
->
->> I am not too familiar with block device code in QEMU, so not sure if
->
->> this is the right fix or if there are some underlying problems.
->
->
->
-> Oh this is quite embarassing! I only added the bdrv_attach_aio_context
->
-> callback for the file-backed device. Your fix is definitely corect for
->
-> host device. Let me make sure there weren't any others missed and I will
->
-> send out a properly formatted patch. Thank you for the quick testing and
->
-> turnaround!
->
->
-Farhan, can you respin your patch with proper sign-off and patch description?
->
-Adding qemu-block.
-I sent it yesterday, sorry I didn't cc everyone from this e-mail:
-http://lists.nongnu.org/archive/html/qemu-block/2018-07/msg00516.html
-Thanks,
-Nish
-
diff --git a/classification_output/01/mistranslation/5798945 b/classification_output/01/mistranslation/5798945
deleted file mode 100644
index 95c3f61d..00000000
--- a/classification_output/01/mistranslation/5798945
+++ /dev/null
@@ -1,43 +0,0 @@
-mistranslation: 0.472
-semantic: 0.387
-other: 0.345
-instruction: 0.261
-
-[BUG][CPU hot-plug]CPU hot-plugs cause the qemu process to coredump
-
-Hello,Recently, when I was developing CPU hot-plugs under the loongarch
-architecture,
-I found that there was a problem with qemu cpu hot-plugs under x86
-architecture,
-which caused the qemu process coredump when repeatedly inserting and
-unplugging
-the CPU when the TCG was accelerated.
-
-
-The specific operation process is as follows:
-
-1.Use the following command to start the virtual machine
-
-qemu-system-x86_64 \
--machine q35Â  \
--cpu Broadwell-IBRS \
--smp 1,maxcpus=4,sockets=4,cores=1,threads=1 \
--m 4G \
--drive file=~/anolis-8.8.qcow2Â  \
--serial stdioÂ Â  \
--monitor telnet:localhost:4498,server,nowait
-
-
-2.Enter QEMU Monitor via telnet for repeated CPU insertion and unplugging
-
-telnet 127.0.0.1 4498
-(qemu) device_add
-Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
-(qemu) device_del cpu1
-(qemu) device_add
-Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
-3.You will notice that the QEMU process has a coredump
-
-# malloc(): unsorted double linked list corrupted
-Aborted (core dumped)
-
diff --git a/classification_output/01/mistranslation/5933279 b/classification_output/01/mistranslation/5933279
deleted file mode 100644
index 719c03c7..00000000
--- a/classification_output/01/mistranslation/5933279
+++ /dev/null
@@ -1,4581 +0,0 @@
-mistranslation: 0.962
-instruction: 0.930
-other: 0.930
-semantic: 0.923
-
-[BUG, RFC] cpr-transfer: qxl guest driver crashes after migration
-
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
->
-EMULATOR=/path/to/emulator
->
-ROOTFS=/path/to/image
->
-QMPSOCK=/var/run/alma8qmp-src.sock
->
->
-$EMULATOR -enable-kvm \
->
--machine q35 \
->
--cpu host -smp 2 -m 2G \
->
--object
->
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
->
--machine memory-backend=ram0 \
->
--machine aux-ram-share=on \
->
--drive file=$ROOTFS,media=disk,if=virtio \
->
--qmp unix:$QMPSOCK,server=on,wait=off \
->
--nographic \
->
--device qxl-vga
-Run migration target:
->
-EMULATOR=/path/to/emulator
->
-ROOTFS=/path/to/image
->
-QMPSOCK=/var/run/alma8qmp-dst.sock
->
->
->
->
-$EMULATOR -enable-kvm \
->
--machine q35 \
->
--cpu host -smp 2 -m 2G \
->
--object
->
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
->
--machine memory-backend=ram0 \
->
--machine aux-ram-share=on \
->
--drive file=$ROOTFS,media=disk,if=virtio \
->
--qmp unix:$QMPSOCK,server=on,wait=off \
->
--nographic \
->
--device qxl-vga \
->
--incoming tcp:0:44444 \
->
--incoming '{"channel-type": "cpr", "addr": { "transport": "socket",
->
-"type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
->
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
->
-QMPSOCK=/var/run/alma8qmp-src.sock
->
->
-$QMPSHELL -p $QMPSOCK <<EOF
->
-migrate-set-parameters mode=cpr-transfer
->
-migrate
->
-channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
->
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
->
-[   73.962002] [TTM] Buffer eviction failed
->
-[   73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
->
-[   73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
->
-VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which speeds up
-the crash in the guest):
->
-#!/bin/bash
->
->
-chvt 3
->
->
-for j in $(seq 80); do
->
-echo "$(date) starting round $j"
->
-if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != ""
->
-]; then
->
-echo "bug was reproduced after $j tries"
->
-exit 1
->
-fi
->
-for i in $(seq 100); do
->
-dmesg > /dev/tty3
->
-done
->
-done
->
->
-echo "bug could not be reproduced"
->
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into this?  Any
-suggestions would be appreciated.  Thanks!
-
-Andrey
-
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
-     -machine q35 \
-     -cpu host -smp 2 -m 2G \
-     -object 
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
-     -machine memory-backend=ram0 \
-     -machine aux-ram-share=on \
-     -drive file=$ROOTFS,media=disk,if=virtio \
-     -qmp unix:$QMPSOCK,server=on,wait=off \
-     -nographic \
-     -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
--machine q35 \
-     -cpu host -smp 2 -m 2G \
-     -object 
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
-     -machine memory-backend=ram0 \
-     -machine aux-ram-share=on \
-     -drive file=$ROOTFS,media=disk,if=virtio \
-     -qmp unix:$QMPSOCK,server=on,wait=off \
-     -nographic \
-     -device qxl-vga \
-     -incoming tcp:0:44444 \
-     -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", 
-"path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
-     migrate-set-parameters mode=cpr-transfer
-     migrate 
-channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
-[   73.962002] [TTM] Buffer eviction failed
-[   73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
-[   73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
-VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
-         echo "$(date) starting round $j"
-         if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" 
-]; then
-                 echo "bug was reproduced after $j tries"
-                 exit 1
-         fi
-         for i in $(seq 100); do
-                 dmesg > /dev/tty3
-         done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into this?  Any
-suggestions would be appreciated.  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-
-- Steve
-
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
-Â Â Â Â  -machine q35 \
-Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â  -object 
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
-Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â  -nographic \
-Â Â Â Â  -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
-Â Â Â Â  -machine q35 \
-Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â  -object 
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
-Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â  -nographic \
-Â Â Â Â  -device qxl-vga \
-Â Â Â Â  -incoming tcp:0:44444 \
-Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", 
-"path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
-Â Â Â Â  migrate-set-parameters mode=cpr-transfer
-Â Â Â Â  migrate 
-channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
-[Â Â  73.962002] [TTM] Buffer eviction failed
-[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
-[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
-VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
-Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
-Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" 
-]; then
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
-Â Â Â Â Â Â Â Â  fi
-Â Â Â Â Â Â Â Â  for i in $(seq 100); do
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
-Â Â Â Â Â Â Â Â  done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.Â  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into this?Â  Any
-suggestions would be appreciated.Â  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-Also try adding this patch to see if it flags any ram blocks as not
-compatible with cpr.  A message is printed at migration start time.
-1740667681-257312-1-git-send-email-steven.sistare@oracle.com
-/">https://lore.kernel.org/qemu-devel/
-1740667681-257312-1-git-send-email-steven.sistare@oracle.com
-/
-- Steve
-
-On 2/28/25 8:20 PM, Steven Sistare wrote:
->
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
->
-> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
->
->> Hi all,
->
->>
->
->> We've been experimenting with cpr-transfer migration mode recently and
->
->> have discovered the following issue with the guest QXL driver:
->
->>
->
->> Run migration source:
->
->>> EMULATOR=/path/to/emulator
->
->>> ROOTFS=/path/to/image
->
->>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>
->
->>> $EMULATOR -enable-kvm \
->
->>> Â Â Â Â  -machine q35 \
->
->>> Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
->
->>> ram0,share=on\
->
->>> Â Â Â Â  -machine memory-backend=ram0 \
->
->>> Â Â Â Â  -machine aux-ram-share=on \
->
->>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>> Â Â Â Â  -nographic \
->
->>> Â Â Â Â  -device qxl-vga
->
->>
->
->> Run migration target:
->
->>> EMULATOR=/path/to/emulator
->
->>> ROOTFS=/path/to/image
->
->>> QMPSOCK=/var/run/alma8qmp-dst.sock
->
->>> $EMULATOR -enable-kvm \
->
->>> Â Â Â Â  -machine q35 \
->
->>> Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
->
->>> ram0,share=on\
->
->>> Â Â Â Â  -machine memory-backend=ram0 \
->
->>> Â Â Â Â  -machine aux-ram-share=on \
->
->>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>> Â Â Â Â  -nographic \
->
->>> Â Â Â Â  -device qxl-vga \
->
->>> Â Â Â Â  -incoming tcp:0:44444 \
->
->>> Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
->
->>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
->
->>
->
->>
->
->> Launch the migration:
->
->>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
->
->>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>
->
->>> $QMPSHELL -p $QMPSOCK <<EOF
->
->>> Â Â Â Â  migrate-set-parameters mode=cpr-transfer
->
->>> Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
->
->>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
->
->>> {"channel-type":"cpr","addr":
->
->>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
->
->>> dst.sock"}}]
->
->>> EOF
->
->>
->
->> Then, after a while, QXL guest driver on target crashes spewing the
->
->> following messages:
->
->>> [Â Â  73.962002] [TTM] Buffer eviction failed
->
->>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
->
->>> 0x00000001)
->
->>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
->
->>> allocate VRAM BO
->
->>
->
->> That seems to be a known kernel QXL driver bug:
->
->>
->
->>
-https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
->
->>
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
->
->>
->
->> (the latter discussion contains that reproduce script which speeds up
->
->> the crash in the guest):
->
->>> #!/bin/bash
->
->>>
->
->>> chvt 3
->
->>>
->
->>> for j in $(seq 80); do
->
->>> Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
->
->>> Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
->
->>> BO")" != "" ]; then
->
->>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
->
->>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
->
->>> Â Â Â Â Â Â Â Â  fi
->
->>> Â Â Â Â Â Â Â Â  for i in $(seq 100); do
->
->>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
->
->>> Â Â Â Â Â Â Â Â  done
->
->>> done
->
->>>
->
->>> echo "bug could not be reproduced"
->
->>> exit 0
->
->>
->
->> The bug itself seems to remain unfixed, as I was able to reproduce that
->
->> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
->
->> cpr-transfer code also seems to be buggy as it triggers the crash -
->
->> without the cpr-transfer migration the above reproduce doesn't lead to
->
->> crash on the source VM.
->
->>
->
->> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
->
->> rather passes it through the memory backend object, our code might
->
->> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
->
->> corruption so far.
->
->>
->
->> Could somebody help the investigation and take a look into this?Â  Any
->
->> suggestions would be appreciated.Â  Thanks!
->
->
->
-> Possibly some memory region created by qxl is not being preserved.
->
-> Try adding these traces to see what is preserved:
->
->
->
-> -trace enable='*cpr*'
->
-> -trace enable='*ram_alloc*'
->
->
-Also try adding this patch to see if it flags any ram blocks as not
->
-compatible with cpr.Â  A message is printed at migration start time.
->
-Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
->
-steven.sistare@oracle.com/
->
->
-- Steve
->
-With the traces enabled + the "migration: ram block cpr blockers" patch
-applied:
-
-Source:
->
-cpr_find_fd pc.bios, id 0 returns -1
->
-cpr_save_fd pc.bios, id 0, fd 22
->
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
->
-0x7fec18e00000
->
-cpr_find_fd pc.rom, id 0 returns -1
->
-cpr_save_fd pc.rom, id 0, fd 23
->
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
->
-0x7fec18c00000
->
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
->
-cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
->
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
->
-24 host 0x7fec18a00000
->
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
->
-cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
->
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
->
-fd 25 host 0x7feb77e00000
->
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
->
-cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
->
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27
->
-host 0x7fec18800000
->
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
->
-cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
->
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
->
-fd 28 host 0x7feb73c00000
->
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
->
-cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
->
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34
->
-host 0x7fec18600000
->
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
->
-cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
->
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35
->
-host 0x7fec18200000
->
-cpr_find_fd /rom@etc/table-loader, id 0 returns -1
->
-cpr_save_fd /rom@etc/table-loader, id 0, fd 36
->
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36
->
-host 0x7feb8b600000
->
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
->
-cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
->
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host
->
-0x7feb8b400000
->
->
-cpr_state_save cpr-transfer mode
->
-cpr_transfer_output /var/run/alma8cpr-dst.sock
-Target:
->
-cpr_transfer_input /var/run/alma8cpr-dst.sock
->
-cpr_state_load cpr-transfer mode
->
-cpr_find_fd pc.bios, id 0 returns 20
->
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
->
-0x7fcdc9800000
->
-cpr_find_fd pc.rom, id 0 returns 19
->
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
->
-0x7fcdc9600000
->
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
->
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
->
-18 host 0x7fcdc9400000
->
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
->
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
->
-fd 17 host 0x7fcd27e00000
->
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
->
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16
->
-host 0x7fcdc9200000
->
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
->
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
->
-fd 15 host 0x7fcd23c00000
->
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
->
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14
->
-host 0x7fcdc8800000
->
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
->
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13
->
-host 0x7fcdc8400000
->
-cpr_find_fd /rom@etc/table-loader, id 0 returns 11
->
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11
->
-host 0x7fcdc8200000
->
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
->
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host
->
-0x7fcd3be00000
-Looks like both vga.vram and qxl.vram are being preserved (with the same
-addresses), and no incompatible ram blocks are found during migration.
-
-Andrey
-
-On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
->
-On 2/28/25 8:20 PM, Steven Sistare wrote:
->
-> On 2/28/2025 1:13 PM, Steven Sistare wrote:
->
->> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
->
->>> Hi all,
->
->>>
->
->>> We've been experimenting with cpr-transfer migration mode recently and
->
->>> have discovered the following issue with the guest QXL driver:
->
->>>
->
->>> Run migration source:
->
->>>> EMULATOR=/path/to/emulator
->
->>>> ROOTFS=/path/to/image
->
->>>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>>
->
->>>> $EMULATOR -enable-kvm \
->
->>>> Â Â Â Â  -machine q35 \
->
->>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
->
->>>> ram0,share=on\
->
->>>> Â Â Â Â  -machine memory-backend=ram0 \
->
->>>> Â Â Â Â  -machine aux-ram-share=on \
->
->>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>>> Â Â Â Â  -nographic \
->
->>>> Â Â Â Â  -device qxl-vga
->
->>>
->
->>> Run migration target:
->
->>>> EMULATOR=/path/to/emulator
->
->>>> ROOTFS=/path/to/image
->
->>>> QMPSOCK=/var/run/alma8qmp-dst.sock
->
->>>> $EMULATOR -enable-kvm \
->
->>>> Â Â Â Â  -machine q35 \
->
->>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
->
->>>> ram0,share=on\
->
->>>> Â Â Â Â  -machine memory-backend=ram0 \
->
->>>> Â Â Â Â  -machine aux-ram-share=on \
->
->>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>>> Â Â Â Â  -nographic \
->
->>>> Â Â Â Â  -device qxl-vga \
->
->>>> Â Â Â Â  -incoming tcp:0:44444 \
->
->>>> Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
->
->>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
->
->>>
->
->>>
->
->>> Launch the migration:
->
->>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
->
->>>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>>
->
->>>> $QMPSHELL -p $QMPSOCK <<EOF
->
->>>> Â Â Â Â  migrate-set-parameters mode=cpr-transfer
->
->>>> Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
->
->>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
->
->>>> {"channel-type":"cpr","addr":
->
->>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
->
->>>> dst.sock"}}]
->
->>>> EOF
->
->>>
->
->>> Then, after a while, QXL guest driver on target crashes spewing the
->
->>> following messages:
->
->>>> [Â Â  73.962002] [TTM] Buffer eviction failed
->
->>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
->
->>>> 0x00000001)
->
->>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
->
->>>> allocate VRAM BO
->
->>>
->
->>> That seems to be a known kernel QXL driver bug:
->
->>>
->
->>>
-https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
->
->>>
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
->
->>>
->
->>> (the latter discussion contains that reproduce script which speeds up
->
->>> the crash in the guest):
->
->>>> #!/bin/bash
->
->>>>
->
->>>> chvt 3
->
->>>>
->
->>>> for j in $(seq 80); do
->
->>>> Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
->
->>>> Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
->
->>>> BO")" != "" ]; then
->
->>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
->
->>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
->
->>>> Â Â Â Â Â Â Â Â  fi
->
->>>> Â Â Â Â Â Â Â Â  for i in $(seq 100); do
->
->>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
->
->>>> Â Â Â Â Â Â Â Â  done
->
->>>> done
->
->>>>
->
->>>> echo "bug could not be reproduced"
->
->>>> exit 0
->
->>>
->
->>> The bug itself seems to remain unfixed, as I was able to reproduce that
->
->>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
->
->>> cpr-transfer code also seems to be buggy as it triggers the crash -
->
->>> without the cpr-transfer migration the above reproduce doesn't lead to
->
->>> crash on the source VM.
->
->>>
->
->>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
->
->>> rather passes it through the memory backend object, our code might
->
->>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
->
->>> corruption so far.
->
->>>
->
->>> Could somebody help the investigation and take a look into this?Â  Any
->
->>> suggestions would be appreciated.Â  Thanks!
->
->>
->
->> Possibly some memory region created by qxl is not being preserved.
->
->> Try adding these traces to see what is preserved:
->
->>
->
->> -trace enable='*cpr*'
->
->> -trace enable='*ram_alloc*'
->
->
->
-> Also try adding this patch to see if it flags any ram blocks as not
->
-> compatible with cpr.Â  A message is printed at migration start time.
->
-> Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
->
-> steven.sistare@oracle.com/
->
->
->
-> - Steve
->
->
->
->
-With the traces enabled + the "migration: ram block cpr blockers" patch
->
-applied:
->
->
-Source:
->
-> cpr_find_fd pc.bios, id 0 returns -1
->
-> cpr_save_fd pc.bios, id 0, fd 22
->
-> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
->
-> 0x7fec18e00000
->
-> cpr_find_fd pc.rom, id 0 returns -1
->
-> cpr_save_fd pc.rom, id 0, fd 23
->
-> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
->
-> 0x7fec18c00000
->
-> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
->
-> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
->
-> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
->
-> 24 host 0x7fec18a00000
->
-> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
->
-> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
->
-> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
->
-> fd 25 host 0x7feb77e00000
->
-> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
->
-> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
->
-> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27
->
-> host 0x7fec18800000
->
-> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
->
-> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
->
-> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
->
-> fd 28 host 0x7feb73c00000
->
-> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
->
-> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
->
-> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34
->
-> host 0x7fec18600000
->
-> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
->
-> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
->
-> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd
->
-> 35 host 0x7fec18200000
->
-> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
->
-> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
->
-> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36
->
-> host 0x7feb8b600000
->
-> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
->
-> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
->
-> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host
->
-> 0x7feb8b400000
->
->
->
-> cpr_state_save cpr-transfer mode
->
-> cpr_transfer_output /var/run/alma8cpr-dst.sock
->
->
-Target:
->
-> cpr_transfer_input /var/run/alma8cpr-dst.sock
->
-> cpr_state_load cpr-transfer mode
->
-> cpr_find_fd pc.bios, id 0 returns 20
->
-> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
->
-> 0x7fcdc9800000
->
-> cpr_find_fd pc.rom, id 0 returns 19
->
-> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
->
-> 0x7fcdc9600000
->
-> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
->
-> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
->
-> 18 host 0x7fcdc9400000
->
-> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
->
-> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
->
-> fd 17 host 0x7fcd27e00000
->
-> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
->
-> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16
->
-> host 0x7fcdc9200000
->
-> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
->
-> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
->
-> fd 15 host 0x7fcd23c00000
->
-> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
->
-> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14
->
-> host 0x7fcdc8800000
->
-> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
->
-> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd
->
-> 13 host 0x7fcdc8400000
->
-> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
->
-> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11
->
-> host 0x7fcdc8200000
->
-> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
->
-> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host
->
-> 0x7fcd3be00000
->
->
-Looks like both vga.vram and qxl.vram are being preserved (with the same
->
-addresses), and no incompatible ram blocks are found during migration.
->
-Sorry, addressed are not the same, of course.  However corresponding ram
-blocks do seem to be preserved and initialized.
-
-On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:20 PM, Steven Sistare wrote:
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
- Â Â Â Â  -machine q35 \
- Â Â Â Â  -cpu host -smp 2 -m 2G \
- Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
-ram0,share=on\
- Â Â Â Â  -machine memory-backend=ram0 \
- Â Â Â Â  -machine aux-ram-share=on \
- Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
- Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
- Â Â Â Â  -nographic \
- Â Â Â Â  -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
- Â Â Â Â  -machine q35 \
- Â Â Â Â  -cpu host -smp 2 -m 2G \
- Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
-ram0,share=on\
- Â Â Â Â  -machine memory-backend=ram0 \
- Â Â Â Â  -machine aux-ram-share=on \
- Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
- Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
- Â Â Â Â  -nographic \
- Â Â Â Â  -device qxl-vga \
- Â Â Â Â  -incoming tcp:0:44444 \
- Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
-"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
- Â Â Â Â  migrate-set-parameters mode=cpr-transfer
- Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
-{"transport":"socket","type":"inet","host":"0","port":"44444"}},
-{"channel-type":"cpr","addr":
-{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
-[Â Â  73.962002] [TTM] Buffer eviction failed
-[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
-0x00000001)
-[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
-allocate VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
- Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
- Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
-BO")" != "" ]; then
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
- Â Â Â Â Â Â Â Â  fi
- Â Â Â Â Â Â Â Â  for i in $(seq 100); do
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
- Â Â Â Â Â Â Â Â  done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.Â  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into this?Â  Any
-suggestions would be appreciated.Â  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-Also try adding this patch to see if it flags any ram blocks as not
-compatible with cpr.Â  A message is printed at migration start time.
- Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
-steven.sistare@oracle.com/
-
-- Steve
-With the traces enabled + the "migration: ram block cpr blockers" patch
-applied:
-
-Source:
-cpr_find_fd pc.bios, id 0 returns -1
-cpr_save_fd pc.bios, id 0, fd 22
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host 
-0x7fec18e00000
-cpr_find_fd pc.rom, id 0 returns -1
-cpr_save_fd pc.rom, id 0, fd 23
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host 
-0x7fec18c00000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
-cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 24 
-host 0x7fec18a00000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd 
-25 host 0x7feb77e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 host 
-0x7fec18800000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd 
-28 host 0x7feb73c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 host 
-0x7fec18600000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35 
-host 0x7fec18200000
-cpr_find_fd /rom@etc/table-loader, id 0 returns -1
-cpr_save_fd /rom@etc/table-loader, id 0, fd 36
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 host 
-0x7feb8b600000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host 
-0x7feb8b400000
-
-cpr_state_save cpr-transfer mode
-cpr_transfer_output /var/run/alma8cpr-dst.sock
-Target:
-cpr_transfer_input /var/run/alma8cpr-dst.sock
-cpr_state_load cpr-transfer mode
-cpr_find_fd pc.bios, id 0 returns 20
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host 
-0x7fcdc9800000
-cpr_find_fd pc.rom, id 0 returns 19
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host 
-0x7fcdc9600000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 18 
-host 0x7fcdc9400000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd 
-17 host 0x7fcd27e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 host 
-0x7fcdc9200000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd 
-15 host 0x7fcd23c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 host 
-0x7fcdc8800000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13 
-host 0x7fcdc8400000
-cpr_find_fd /rom@etc/table-loader, id 0 returns 11
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 host 
-0x7fcdc8200000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host 
-0x7fcd3be00000
-Looks like both vga.vram and qxl.vram are being preserved (with the same
-addresses), and no incompatible ram blocks are found during migration.
-Sorry, addressed are not the same, of course.  However corresponding ram
-blocks do seem to be preserved and initialized.
-So far, I have not reproduced the guest driver failure.
-
-However, I have isolated places where new QEMU improperly writes to
-the qxl memory regions prior to starting the guest, by mmap'ing them
-readonly after cpr:
-
-  qemu_ram_alloc_internal()
-    if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
-        ram_flags |= RAM_READONLY;
-    new_block = qemu_ram_alloc_from_fd(...)
-
-I have attached a draft fix; try it and let me know.
-My console window looks fine before and after cpr, using
--vnc $hostip:0 -vga qxl
-
-- Steve
-0001-hw-qxl-cpr-support-preliminary.patch
-Description:
-Text document
-
-On 3/4/25 9:05 PM, Steven Sistare wrote:
->
-On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
->
-> On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
->
->> On 2/28/25 8:20 PM, Steven Sistare wrote:
->
->>> On 2/28/2025 1:13 PM, Steven Sistare wrote:
->
->>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
->
->>>>> Hi all,
->
->>>>>
->
->>>>> We've been experimenting with cpr-transfer migration mode recently
->
->>>>> and
->
->>>>> have discovered the following issue with the guest QXL driver:
->
->>>>>
->
->>>>> Run migration source:
->
->>>>>> EMULATOR=/path/to/emulator
->
->>>>>> ROOTFS=/path/to/image
->
->>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>>>>
->
->>>>>> $EMULATOR -enable-kvm \
->
->>>>>> Â Â Â Â Â  -machine q35 \
->
->>>>>> Â Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>>>>> Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
->
->>>>>> ram0,share=on\
->
->>>>>> Â Â Â Â Â  -machine memory-backend=ram0 \
->
->>>>>> Â Â Â Â Â  -machine aux-ram-share=on \
->
->>>>>> Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>>>>> Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>>>>> Â Â Â Â Â  -nographic \
->
->>>>>> Â Â Â Â Â  -device qxl-vga
->
->>>>>
->
->>>>> Run migration target:
->
->>>>>> EMULATOR=/path/to/emulator
->
->>>>>> ROOTFS=/path/to/image
->
->>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
->
->>>>>> $EMULATOR -enable-kvm \
->
->>>>>> Â Â Â Â Â  -machine q35 \
->
->>>>>> Â Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>>>>> Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
->
->>>>>> ram0,share=on\
->
->>>>>> Â Â Â Â Â  -machine memory-backend=ram0 \
->
->>>>>> Â Â Â Â Â  -machine aux-ram-share=on \
->
->>>>>> Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>>>>> Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>>>>> Â Â Â Â Â  -nographic \
->
->>>>>> Â Â Â Â Â  -device qxl-vga \
->
->>>>>> Â Â Â Â Â  -incoming tcp:0:44444 \
->
->>>>>> Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
->
->>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
->
->>>>>
->
->>>>>
->
->>>>> Launch the migration:
->
->>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
->
->>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>>>>
->
->>>>>> $QMPSHELL -p $QMPSOCK <<EOF
->
->>>>>> Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
->
->>>>>> Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
->
->>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
->
->>>>>> {"channel-type":"cpr","addr":
->
->>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
->
->>>>>> dst.sock"}}]
->
->>>>>> EOF
->
->>>>>
->
->>>>> Then, after a while, QXL guest driver on target crashes spewing the
->
->>>>> following messages:
->
->>>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
->
->>>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
->
->>>>>> 0x00000001)
->
->>>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
->
->>>>>> allocate VRAM BO
->
->>>>>
->
->>>>> That seems to be a known kernel QXL driver bug:
->
->>>>>
->
->>>>>
-https://lore.kernel.org/all/20220907094423.93581-1-
->
->>>>> min_halo@163.com/T/
->
->>>>>
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
->
->>>>>
->
->>>>> (the latter discussion contains that reproduce script which speeds up
->
->>>>> the crash in the guest):
->
->>>>>> #!/bin/bash
->
->>>>>>
->
->>>>>> chvt 3
->
->>>>>>
->
->>>>>> for j in $(seq 80); do
->
->>>>>> Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
->
->>>>>> Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
->
->>>>>> BO")" != "" ]; then
->
->>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
->
->>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
->
->>>>>> Â Â Â Â Â Â Â Â Â  fi
->
->>>>>> Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
->
->>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
->
->>>>>> Â Â Â Â Â Â Â Â Â  done
->
->>>>>> done
->
->>>>>>
->
->>>>>> echo "bug could not be reproduced"
->
->>>>>> exit 0
->
->>>>>
->
->>>>> The bug itself seems to remain unfixed, as I was able to reproduce
->
->>>>> that
->
->>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
->
->>>>> cpr-transfer code also seems to be buggy as it triggers the crash -
->
->>>>> without the cpr-transfer migration the above reproduce doesn't
->
->>>>> lead to
->
->>>>> crash on the source VM.
->
->>>>>
->
->>>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
->
->>>>> rather passes it through the memory backend object, our code might
->
->>>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
->
->>>>> corruption so far.
->
->>>>>
->
->>>>> Could somebody help the investigation and take a look into this?Â  Any
->
->>>>> suggestions would be appreciated.Â  Thanks!
->
->>>>
->
->>>> Possibly some memory region created by qxl is not being preserved.
->
->>>> Try adding these traces to see what is preserved:
->
->>>>
->
->>>> -trace enable='*cpr*'
->
->>>> -trace enable='*ram_alloc*'
->
->>>
->
->>> Also try adding this patch to see if it flags any ram blocks as not
->
->>> compatible with cpr.Â  A message is printed at migration start time.
->
->>> Â Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
->
->>> email-
->
->>> steven.sistare@oracle.com/
->
->>>
->
->>> - Steve
->
->>>
->
->>
->
->> With the traces enabled + the "migration: ram block cpr blockers" patch
->
->> applied:
->
->>
->
->> Source:
->
->>> cpr_find_fd pc.bios, id 0 returns -1
->
->>> cpr_save_fd pc.bios, id 0, fd 22
->
->>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
->
->>> 0x7fec18e00000
->
->>> cpr_find_fd pc.rom, id 0 returns -1
->
->>> cpr_save_fd pc.rom, id 0, fd 23
->
->>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
->
->>> 0x7fec18c00000
->
->>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
->
->>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
->
->>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
->
->>> 262144 fd 24 host 0x7fec18a00000
->
->>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
->
->>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
->
->>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
->
->>> 67108864 fd 25 host 0x7feb77e00000
->
->>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
->
->>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
->
->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
->
->>> fd 27 host 0x7fec18800000
->
->>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
->
->>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
->
->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
->
->>> 67108864 fd 28 host 0x7feb73c00000
->
->>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
->
->>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
->
->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
->
->>> fd 34 host 0x7fec18600000
->
->>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
->
->>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
->
->>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
->
->>> 2097152 fd 35 host 0x7fec18200000
->
->>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
->
->>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
->
->>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
->
->>> fd 36 host 0x7feb8b600000
->
->>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
->
->>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
->
->>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
->
->>> 37 host 0x7feb8b400000
->
->>>
->
->>> cpr_state_save cpr-transfer mode
->
->>> cpr_transfer_output /var/run/alma8cpr-dst.sock
->
->>
->
->> Target:
->
->>> cpr_transfer_input /var/run/alma8cpr-dst.sock
->
->>> cpr_state_load cpr-transfer mode
->
->>> cpr_find_fd pc.bios, id 0 returns 20
->
->>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
->
->>> 0x7fcdc9800000
->
->>> cpr_find_fd pc.rom, id 0 returns 19
->
->>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
->
->>> 0x7fcdc9600000
->
->>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
->
->>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
->
->>> 262144 fd 18 host 0x7fcdc9400000
->
->>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
->
->>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
->
->>> 67108864 fd 17 host 0x7fcd27e00000
->
->>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
->
->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
->
->>> fd 16 host 0x7fcdc9200000
->
->>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
->
->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
->
->>> 67108864 fd 15 host 0x7fcd23c00000
->
->>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
->
->>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
->
->>> fd 14 host 0x7fcdc8800000
->
->>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
->
->>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
->
->>> 2097152 fd 13 host 0x7fcdc8400000
->
->>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
->
->>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
->
->>> fd 11 host 0x7fcdc8200000
->
->>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
->
->>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
->
->>> 10 host 0x7fcd3be00000
->
->>
->
->> Looks like both vga.vram and qxl.vram are being preserved (with the same
->
->> addresses), and no incompatible ram blocks are found during migration.
->
->
->
-> Sorry, addressed are not the same, of course.Â  However corresponding ram
->
-> blocks do seem to be preserved and initialized.
->
->
-So far, I have not reproduced the guest driver failure.
->
->
-However, I have isolated places where new QEMU improperly writes to
->
-the qxl memory regions prior to starting the guest, by mmap'ing them
->
-readonly after cpr:
->
->
-Â  qemu_ram_alloc_internal()
->
-Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
->
-Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
->
-Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
->
->
-I have attached a draft fix; try it and let me know.
->
-My console window looks fine before and after cpr, using
->
--vnc $hostip:0 -vga qxl
->
->
-- Steve
-Regarding the reproduce: when I launch the buggy version with the same
-options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
-my VNC client silently hangs on the target after a while.  Could it
-happen on your stand as well?  Could you try launching VM with
-"-nographic -device qxl-vga"?  That way VM's serial console is given you
-directly in the shell, so when qxl driver crashes you're still able to
-inspect the kernel messages.
-
-As for your patch, I can report that it doesn't resolve the issue as it
-is.  But I was able to track down another possible memory corruption
-using your approach with readonly mmap'ing:
-
->
-Program terminated with signal SIGSEGV, Segmentation fault.
->
-#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
->
-412         d->ram->magic       = cpu_to_le32(QXL_RAM_MAGIC);
->
-[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
->
-(gdb) bt
->
-#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
->
-#1  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
->
-errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
->
-#2  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
->
-errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
->
-#3  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
->
-errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
->
-#4  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
->
-value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
->
-#5  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
->
-v=0x5638996f3770, name=0x56389759b141 "realized", opaque=0x5638987893d0,
->
-errp=0x7ffd3c2b84e0)
->
-at ../qom/object.c:2374
->
-#6  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
->
-name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
->
-at ../qom/object.c:1449
->
-#7  0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70,
->
-name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0)
->
-at ../qom/qom-qobject.c:28
->
-#8  0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70,
->
-name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0)
->
-at ../qom/object.c:1519
->
-#9  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
->
-bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
->
-#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50,
->
-from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714
->
-#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
->
-errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
->
-#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150,
->
-errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207
->
-#13 0x000056389737a6cc in qemu_opts_foreach
->
-(list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
->
-<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
->
-at ../util/qemu-option.c:1135
->
-#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745
->
-#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
->
-<error_fatal>) at ../system/vl.c:2806
->
-#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at
->
-../system/vl.c:3838
->
-#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at
->
-../system/main.c:72
-So the attached adjusted version of your patch does seem to help.  At
-least I can't reproduce the crash on my stand.
-
-I'm wondering, could it be useful to explicitly mark all the reused
-memory regions readonly upon cpr-transfer, and then make them writable
-back again after the migration is done?  That way we will be segfaulting
-early on instead of debugging tricky memory corruptions.
-
-Andrey
-0001-hw-qxl-cpr-support-preliminary.patch
-Description:
-Text Data
-
-On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
-On 3/4/25 9:05 PM, Steven Sistare wrote:
-On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:20 PM, Steven Sistare wrote:
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently
-and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
- Â Â Â Â Â  -machine q35 \
- Â Â Â Â Â  -cpu host -smp 2 -m 2G \
- Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
-ram0,share=on\
- Â Â Â Â Â  -machine memory-backend=ram0 \
- Â Â Â Â Â  -machine aux-ram-share=on \
- Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
- Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
- Â Â Â Â Â  -nographic \
- Â Â Â Â Â  -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
- Â Â Â Â Â  -machine q35 \
- Â Â Â Â Â  -cpu host -smp 2 -m 2G \
- Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
-ram0,share=on\
- Â Â Â Â Â  -machine memory-backend=ram0 \
- Â Â Â Â Â  -machine aux-ram-share=on \
- Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
- Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
- Â Â Â Â Â  -nographic \
- Â Â Â Â Â  -device qxl-vga \
- Â Â Â Â Â  -incoming tcp:0:44444 \
- Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
-"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
- Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
- Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
-{"transport":"socket","type":"inet","host":"0","port":"44444"}},
-{"channel-type":"cpr","addr":
-{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
-[Â Â  73.962002] [TTM] Buffer eviction failed
-[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
-0x00000001)
-[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
-allocate VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-
-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
- Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
- Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
-BO")" != "" ]; then
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
- Â Â Â Â Â Â Â Â Â  fi
- Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
- Â Â Â Â Â Â Â Â Â  done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce
-that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't
-lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.Â  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into this?Â  Any
-suggestions would be appreciated.Â  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-Also try adding this patch to see if it flags any ram blocks as not
-compatible with cpr.Â  A message is printed at migration start time.
- Â Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
-email-
-steven.sistare@oracle.com/
-
-- Steve
-With the traces enabled + the "migration: ram block cpr blockers" patch
-applied:
-
-Source:
-cpr_find_fd pc.bios, id 0 returns -1
-cpr_save_fd pc.bios, id 0, fd 22
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
-0x7fec18e00000
-cpr_find_fd pc.rom, id 0 returns -1
-cpr_save_fd pc.rom, id 0, fd 23
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
-0x7fec18c00000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
-cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 24 host 0x7fec18a00000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 25 host 0x7feb77e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 27 host 0x7fec18800000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 28 host 0x7feb73c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 34 host 0x7fec18600000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 35 host 0x7fec18200000
-cpr_find_fd /rom@etc/table-loader, id 0 returns -1
-cpr_save_fd /rom@etc/table-loader, id 0, fd 36
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 36 host 0x7feb8b600000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-37 host 0x7feb8b400000
-
-cpr_state_save cpr-transfer mode
-cpr_transfer_output /var/run/alma8cpr-dst.sock
-Target:
-cpr_transfer_input /var/run/alma8cpr-dst.sock
-cpr_state_load cpr-transfer mode
-cpr_find_fd pc.bios, id 0 returns 20
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
-0x7fcdc9800000
-cpr_find_fd pc.rom, id 0 returns 19
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
-0x7fcdc9600000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 18 host 0x7fcdc9400000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 17 host 0x7fcd27e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 16 host 0x7fcdc9200000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 15 host 0x7fcd23c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 14 host 0x7fcdc8800000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 13 host 0x7fcdc8400000
-cpr_find_fd /rom@etc/table-loader, id 0 returns 11
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 11 host 0x7fcdc8200000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-10 host 0x7fcd3be00000
-Looks like both vga.vram and qxl.vram are being preserved (with the same
-addresses), and no incompatible ram blocks are found during migration.
-Sorry, addressed are not the same, of course.Â  However corresponding ram
-blocks do seem to be preserved and initialized.
-So far, I have not reproduced the guest driver failure.
-
-However, I have isolated places where new QEMU improperly writes to
-the qxl memory regions prior to starting the guest, by mmap'ing them
-readonly after cpr:
-
- Â  qemu_ram_alloc_internal()
- Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
- Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
- Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
-
-I have attached a draft fix; try it and let me know.
-My console window looks fine before and after cpr, using
--vnc $hostip:0 -vga qxl
-
-- Steve
-Regarding the reproduce: when I launch the buggy version with the same
-options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
-my VNC client silently hangs on the target after a while.  Could it
-happen on your stand as well?
-cpr does not preserve the vnc connection and session.  To test, I specify
-port 0 for the source VM and port 1 for the dest.  When the src vnc goes
-dormant the dest vnc becomes active.
-Could you try launching VM with
-"-nographic -device qxl-vga"?  That way VM's serial console is given you
-directly in the shell, so when qxl driver crashes you're still able to
-inspect the kernel messages.
-I have been running like that, but have not reproduced the qxl driver crash,
-and I suspect my guest image+kernel is too old.  However, once I realized the
-issue was post-cpr modification of qxl memory, I switched my attention to the
-fix.
-As for your patch, I can report that it doesn't resolve the issue as it
-is.  But I was able to track down another possible memory corruption
-using your approach with readonly mmap'ing:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-412         d->ram->magic       = cpu_to_le32(QXL_RAM_MAGIC);
-[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
-(gdb) bt
-#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-#1  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, 
-errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
-#2  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, 
-errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
-#3  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, 
-errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
-#4  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, value=true, 
-errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
-#5  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, v=0x5638996f3770, 
-name=0x56389759b141 "realized", opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
-     at ../qom/object.c:2374
-#6  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, name=0x56389759b141 
-"realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
-     at ../qom/object.c:1449
-#7  0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70, 
-name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0)
-     at ../qom/qom-qobject.c:28
-#8  0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70, 
-name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0)
-     at ../qom/object.c:1519
-#9  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, bus=0x563898cf3c20, 
-errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
-#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50, 
-from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714
-#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, errp=0x56389855dc40 
-<error_fatal>) at ../system/qdev-monitor.c:733
-#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150, 
-errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207
-#13 0x000056389737a6cc in qemu_opts_foreach
-     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca <device_init_func>, 
-opaque=0x0, errp=0x56389855dc40 <error_fatal>)
-     at ../util/qemu-option.c:1135
-#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745
-#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 
-<error_fatal>) at ../system/vl.c:2806
-#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at 
-../system/vl.c:3838
-#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at 
-../system/main.c:72
-So the attached adjusted version of your patch does seem to help.  At
-least I can't reproduce the crash on my stand.
-Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram are
-definitely harmful.  Try V2 of the patch, attached, which skips the lines
-of init_qxl_ram that modify guest memory.
-I'm wondering, could it be useful to explicitly mark all the reused
-memory regions readonly upon cpr-transfer, and then make them writable
-back again after the migration is done?  That way we will be segfaulting
-early on instead of debugging tricky memory corruptions.
-It's a useful debugging technique, but changing protection on a large memory 
-region
-can be too expensive for production due to TLB shootdowns.
-
-Also, there are cases where writes are performed but the value is guaranteed to
-be the same:
-  qxl_post_load()
-    qxl_set_mode()
-      d->rom->mode = cpu_to_le32(modenr);
-The value is the same because mode and shadow_rom.mode were passed in vmstate
-from old qemu.
-
-- Steve
-0001-hw-qxl-cpr-support-preliminary-V2.patch
-Description:
-Text document
-
-On 3/5/25 22:19, Steven Sistare wrote:
-On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
-On 3/4/25 9:05 PM, Steven Sistare wrote:
-On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:20 PM, Steven Sistare wrote:
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently
-and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
-Â Â Â Â Â Â  -machine q35 \
-Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â Â Â  -object
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
-ram0,share=on\
-Â Â Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â Â Â  -nographic \
-Â Â Â Â Â Â  -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
-Â Â Â Â Â Â  -machine q35 \
-Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â Â Â  -object
-memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
-ram0,share=on\
-Â Â Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â Â Â  -nographic \
-Â Â Â Â Â Â  -device qxl-vga \
-Â Â Â Â Â Â  -incoming tcp:0:44444 \
-Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
-"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
-Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
-Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
-{"transport":"socket","type":"inet","host":"0","port":"44444"}},
-{"channel-type":"cpr","addr":
-{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing
-the
-following messages:
-[Â Â  73.962002] [TTM] Buffer eviction failed
-[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
-0x00000001)
-[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR*
-failed to
-allocate VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-
-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which
-speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
-Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
-Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to
-allocate VRAM
-BO")" != "" ]; then
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
-Â Â Â Â Â Â Â Â Â Â  fi
-Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
-Â Â Â Â Â Â Â Â Â Â  done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce
-that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the
-crash -
-without the cpr-transfer migration the above reproduce doesn't
-lead to
-crash on the source VM.
-I suspect that, as cpr-transfer doesn't migrate the guest
-memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.Â  However, I wasn't able to trace the
-corruption so far.
-Could somebody help the investigation and take a look into
-this?Â  Any
-suggestions would be appreciated.Â  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-Also try adding this patch to see if it flags any ram blocks as not
-compatible with cpr.Â  A message is printed at migration start time.
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
-email-
-steven.sistare@oracle.com/
-
-- Steve
-With the traces enabled + the "migration: ram block cpr blockers"
-patch
-applied:
-
-Source:
-cpr_find_fd pc.bios, id 0 returns -1
-cpr_save_fd pc.bios, id 0, fd 22
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
-0x7fec18e00000
-cpr_find_fd pc.rom, id 0 returns -1
-cpr_save_fd pc.rom, id 0, fd 23
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
-0x7fec18c00000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
-cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 24 host 0x7fec18a00000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 25 host 0x7feb77e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 27 host 0x7fec18800000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 28 host 0x7feb73c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 34 host 0x7fec18600000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 35 host 0x7fec18200000
-cpr_find_fd /rom@etc/table-loader, id 0 returns -1
-cpr_save_fd /rom@etc/table-loader, id 0, fd 36
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 36 host 0x7feb8b600000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-37 host 0x7feb8b400000
-
-cpr_state_save cpr-transfer mode
-cpr_transfer_output /var/run/alma8cpr-dst.sock
-Target:
-cpr_transfer_input /var/run/alma8cpr-dst.sock
-cpr_state_load cpr-transfer mode
-cpr_find_fd pc.bios, id 0 returns 20
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
-0x7fcdc9800000
-cpr_find_fd pc.rom, id 0 returns 19
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
-0x7fcdc9600000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 18 host 0x7fcdc9400000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 17 host 0x7fcd27e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 16 host 0x7fcdc9200000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 15 host 0x7fcd23c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 14 host 0x7fcdc8800000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 13 host 0x7fcdc8400000
-cpr_find_fd /rom@etc/table-loader, id 0 returns 11
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 11 host 0x7fcdc8200000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-10 host 0x7fcd3be00000
-Looks like both vga.vram and qxl.vram are being preserved (with
-the same
-addresses), and no incompatible ram blocks are found during
-migration.
-Sorry, addressed are not the same, of course.Â  However
-corresponding ram
-blocks do seem to be preserved and initialized.
-So far, I have not reproduced the guest driver failure.
-
-However, I have isolated places where new QEMU improperly writes to
-the qxl memory regions prior to starting the guest, by mmap'ing them
-readonly after cpr:
-
-Â Â  qemu_ram_alloc_internal()
-Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
-Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
-Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
-
-I have attached a draft fix; try it and let me know.
-My console window looks fine before and after cpr, using
--vnc $hostip:0 -vga qxl
-
-- Steve
-Regarding the reproduce: when I launch the buggy version with the same
-options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
-my VNC client silently hangs on the target after a while.Â  Could it
-happen on your stand as well?
-cpr does not preserve the vnc connection and session.Â  To test, I specify
-port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
-dormant the dest vnc becomes active.
-Could you try launching VM with
-"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
-directly in the shell, so when qxl driver crashes you're still able to
-inspect the kernel messages.
-I have been running like that, but have not reproduced the qxl driver
-crash,
-and I suspect my guest image+kernel is too old.Â  However, once I
-realized the
-issue was post-cpr modification of qxl memory, I switched my attention
-to the
-fix.
-As for your patch, I can report that it doesn't resolve the issue as it
-is.Â  But I was able to track down another possible memory corruption
-using your approach with readonly mmap'ing:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
-[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
-(gdb) bt
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
-errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
-#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
-errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
-#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
-errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
-#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
-value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
-#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
-v=0x5638996f3770, name=0x56389759b141 "realized",
-opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
-Â Â Â Â  at ../qom/object.c:2374
-#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
-name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
-Â Â Â Â  at ../qom/object.c:1449
-#7Â  0x00005638970f8586 in object_property_set_qobject
-(obj=0x5638996e0e70, name=0x56389759b141 "realized",
-value=0x5638996df900, errp=0x7ffd3c2b84e0)
-Â Â Â Â  at ../qom/qom-qobject.c:28
-#8Â  0x00005638970f3d8d in object_property_set_bool
-(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
-errp=0x7ffd3c2b84e0)
-Â Â Â Â  at ../qom/object.c:1519
-#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
-bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
-#10 0x0000563896dba675 in qdev_device_add_from_qdict
-(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at
-../system/qdev-monitor.c:714
-#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
-errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
-#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
-opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at
-../system/vl.c:1207
-#13 0x000056389737a6cc in qemu_opts_foreach
-Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
-<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
-Â Â Â Â  at ../util/qemu-option.c:1135
-#14 0x0000563896dc89b5 in qemu_create_cli_devices () at
-../system/vl.c:2745
-#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
-<error_fatal>) at ../system/vl.c:2806
-#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
-at ../system/vl.c:3838
-#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at
-../system/main.c:72
-So the attached adjusted version of your patch does seem to help.Â  At
-least I can't reproduce the crash on my stand.
-Thanks for the stack trace; the calls to SPICE_RING_INIT in
-init_qxl_ram are
-definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
-of init_qxl_ram that modify guest memory.
-I'm wondering, could it be useful to explicitly mark all the reused
-memory regions readonly upon cpr-transfer, and then make them writable
-back again after the migration is done?Â  That way we will be segfaulting
-early on instead of debugging tricky memory corruptions.
-It's a useful debugging technique, but changing protection on a large
-memory region
-can be too expensive for production due to TLB shootdowns.
-Good point. Though we could move this code under non-default option to
-avoid re-writing.
-
-Den
-
-On 3/5/25 11:19 PM, Steven Sistare wrote:
->
-On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
->
-> On 3/4/25 9:05 PM, Steven Sistare wrote:
->
->> On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
->
->>> On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
->
->>>> On 2/28/25 8:20 PM, Steven Sistare wrote:
->
->>>>> On 2/28/2025 1:13 PM, Steven Sistare wrote:
->
->>>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
->
->>>>>>> Hi all,
->
->>>>>>>
->
->>>>>>> We've been experimenting with cpr-transfer migration mode recently
->
->>>>>>> and
->
->>>>>>> have discovered the following issue with the guest QXL driver:
->
->>>>>>>
->
->>>>>>> Run migration source:
->
->>>>>>>> EMULATOR=/path/to/emulator
->
->>>>>>>> ROOTFS=/path/to/image
->
->>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>>>>>>
->
->>>>>>>> $EMULATOR -enable-kvm \
->
->>>>>>>> Â Â Â Â Â Â  -machine q35 \
->
->>>>>>>> Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>>>>>>> Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
->
->>>>>>>> dev/shm/
->
->>>>>>>> ram0,share=on\
->
->>>>>>>> Â Â Â Â Â Â  -machine memory-backend=ram0 \
->
->>>>>>>> Â Â Â Â Â Â  -machine aux-ram-share=on \
->
->>>>>>>> Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>>>>>>> Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>>>>>>> Â Â Â Â Â Â  -nographic \
->
->>>>>>>> Â Â Â Â Â Â  -device qxl-vga
->
->>>>>>>
->
->>>>>>> Run migration target:
->
->>>>>>>> EMULATOR=/path/to/emulator
->
->>>>>>>> ROOTFS=/path/to/image
->
->>>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
->
->>>>>>>> $EMULATOR -enable-kvm \
->
->>>>>>>> Â Â Â Â Â Â  -machine q35 \
->
->>>>>>>> Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
->
->>>>>>>> Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
->
->>>>>>>> dev/shm/
->
->>>>>>>> ram0,share=on\
->
->>>>>>>> Â Â Â Â Â Â  -machine memory-backend=ram0 \
->
->>>>>>>> Â Â Â Â Â Â  -machine aux-ram-share=on \
->
->>>>>>>> Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
->
->>>>>>>> Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
->
->>>>>>>> Â Â Â Â Â Â  -nographic \
->
->>>>>>>> Â Â Â Â Â Â  -device qxl-vga \
->
->>>>>>>> Â Â Â Â Â Â  -incoming tcp:0:44444 \
->
->>>>>>>> Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
->
->>>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
->
->>>>>>>
->
->>>>>>>
->
->>>>>>> Launch the migration:
->
->>>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
->
->>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
->
->>>>>>>>
->
->>>>>>>> $QMPSHELL -p $QMPSOCK <<EOF
->
->>>>>>>> Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
->
->>>>>>>> Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
->
->>>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
->
->>>>>>>> {"channel-type":"cpr","addr":
->
->>>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
->
->>>>>>>> dst.sock"}}]
->
->>>>>>>> EOF
->
->>>>>>>
->
->>>>>>> Then, after a while, QXL guest driver on target crashes spewing the
->
->>>>>>> following messages:
->
->>>>>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
->
->>>>>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
->
->>>>>>>> 0x00000001)
->
->>>>>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
->
->>>>>>>> allocate VRAM BO
->
->>>>>>>
->
->>>>>>> That seems to be a known kernel QXL driver bug:
->
->>>>>>>
->
->>>>>>>
-https://lore.kernel.org/all/20220907094423.93581-1-
->
->>>>>>> min_halo@163.com/T/
->
->>>>>>>
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
->
->>>>>>>
->
->>>>>>> (the latter discussion contains that reproduce script which
->
->>>>>>> speeds up
->
->>>>>>> the crash in the guest):
->
->>>>>>>> #!/bin/bash
->
->>>>>>>>
->
->>>>>>>> chvt 3
->
->>>>>>>>
->
->>>>>>>> for j in $(seq 80); do
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
->
->>>>>>>> VRAM
->
->>>>>>>> BO")" != "" ]; then
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â  fi
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
->
->>>>>>>> Â Â Â Â Â Â Â Â Â Â  done
->
->>>>>>>> done
->
->>>>>>>>
->
->>>>>>>> echo "bug could not be reproduced"
->
->>>>>>>> exit 0
->
->>>>>>>
->
->>>>>>> The bug itself seems to remain unfixed, as I was able to reproduce
->
->>>>>>> that
->
->>>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
->
->>>>>>> cpr-transfer code also seems to be buggy as it triggers the crash -
->
->>>>>>> without the cpr-transfer migration the above reproduce doesn't
->
->>>>>>> lead to
->
->>>>>>> crash on the source VM.
->
->>>>>>>
->
->>>>>>> I suspect that, as cpr-transfer doesn't migrate the guest
->
->>>>>>> memory, but
->
->>>>>>> rather passes it through the memory backend object, our code might
->
->>>>>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
->
->>>>>>> corruption so far.
->
->>>>>>>
->
->>>>>>> Could somebody help the investigation and take a look into
->
->>>>>>> this?Â  Any
->
->>>>>>> suggestions would be appreciated.Â  Thanks!
->
->>>>>>
->
->>>>>> Possibly some memory region created by qxl is not being preserved.
->
->>>>>> Try adding these traces to see what is preserved:
->
->>>>>>
->
->>>>>> -trace enable='*cpr*'
->
->>>>>> -trace enable='*ram_alloc*'
->
->>>>>
->
->>>>> Also try adding this patch to see if it flags any ram blocks as not
->
->>>>> compatible with cpr.Â  A message is printed at migration start time.
->
->>>>> Â Â Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
->
->>>>> email-
->
->>>>> steven.sistare@oracle.com/
->
->>>>>
->
->>>>> - Steve
->
->>>>>
->
->>>>
->
->>>> With the traces enabled + the "migration: ram block cpr blockers"
->
->>>> patch
->
->>>> applied:
->
->>>>
->
->>>> Source:
->
->>>>> cpr_find_fd pc.bios, id 0 returns -1
->
->>>>> cpr_save_fd pc.bios, id 0, fd 22
->
->>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
->
->>>>> 0x7fec18e00000
->
->>>>> cpr_find_fd pc.rom, id 0 returns -1
->
->>>>> cpr_save_fd pc.rom, id 0, fd 23
->
->>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
->
->>>>> 0x7fec18c00000
->
->>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
->
->>>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
->
->>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
->
->>>>> 262144 fd 24 host 0x7fec18a00000
->
->>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
->
->>>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
->
->>>>> 67108864 fd 25 host 0x7feb77e00000
->
->>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
->
->>>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
->
->>>>> fd 27 host 0x7fec18800000
->
->>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
->
->>>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
->
->>>>> 67108864 fd 28 host 0x7feb73c00000
->
->>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
->
->>>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
->
->>>>> fd 34 host 0x7fec18600000
->
->>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
->
->>>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
->
->>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
->
->>>>> 2097152 fd 35 host 0x7fec18200000
->
->>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
->
->>>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
->
->>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
->
->>>>> fd 36 host 0x7feb8b600000
->
->>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
->
->>>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
->
->>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
->
->>>>> 37 host 0x7feb8b400000
->
->>>>>
->
->>>>> cpr_state_save cpr-transfer mode
->
->>>>> cpr_transfer_output /var/run/alma8cpr-dst.sock
->
->>>>
->
->>>> Target:
->
->>>>> cpr_transfer_input /var/run/alma8cpr-dst.sock
->
->>>>> cpr_state_load cpr-transfer mode
->
->>>>> cpr_find_fd pc.bios, id 0 returns 20
->
->>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
->
->>>>> 0x7fcdc9800000
->
->>>>> cpr_find_fd pc.rom, id 0 returns 19
->
->>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
->
->>>>> 0x7fcdc9600000
->
->>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
->
->>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
->
->>>>> 262144 fd 18 host 0x7fcdc9400000
->
->>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
->
->>>>> 67108864 fd 17 host 0x7fcd27e00000
->
->>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
->
->>>>> fd 16 host 0x7fcdc9200000
->
->>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
->
->>>>> 67108864 fd 15 host 0x7fcd23c00000
->
->>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
->
->>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
->
->>>>> fd 14 host 0x7fcdc8800000
->
->>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
->
->>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
->
->>>>> 2097152 fd 13 host 0x7fcdc8400000
->
->>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
->
->>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
->
->>>>> fd 11 host 0x7fcdc8200000
->
->>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
->
->>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
->
->>>>> 10 host 0x7fcd3be00000
->
->>>>
->
->>>> Looks like both vga.vram and qxl.vram are being preserved (with the
->
->>>> same
->
->>>> addresses), and no incompatible ram blocks are found during migration.
->
->>>
->
->>> Sorry, addressed are not the same, of course.Â  However corresponding
->
->>> ram
->
->>> blocks do seem to be preserved and initialized.
->
->>
->
->> So far, I have not reproduced the guest driver failure.
->
->>
->
->> However, I have isolated places where new QEMU improperly writes to
->
->> the qxl memory regions prior to starting the guest, by mmap'ing them
->
->> readonly after cpr:
->
->>
->
->> Â Â  qemu_ram_alloc_internal()
->
->> Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
->
->> Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
->
->> Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
->
->>
->
->> I have attached a draft fix; try it and let me know.
->
->> My console window looks fine before and after cpr, using
->
->> -vnc $hostip:0 -vga qxl
->
->>
->
->> - Steve
->
->
->
-> Regarding the reproduce: when I launch the buggy version with the same
->
-> options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
->
-> my VNC client silently hangs on the target after a while.Â  Could it
->
-> happen on your stand as well?Â
->
->
-cpr does not preserve the vnc connection and session.Â  To test, I specify
->
-port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
->
-dormant the dest vnc becomes active.
->
-Sure, I meant that VNC on the dest (on the port 1) works for a while
-after the migration and then hangs, apparently after the guest QXL crash.
-
->
-> Could you try launching VM with
->
-> "-nographic -device qxl-vga"?Â  That way VM's serial console is given you
->
-> directly in the shell, so when qxl driver crashes you're still able to
->
-> inspect the kernel messages.
->
->
-I have been running like that, but have not reproduced the qxl driver
->
-crash,
->
-and I suspect my guest image+kernel is too old.
-Yes, that's probably the case.  But the crash occurs on my Fedora 41
-guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
-be buggy.
-
-
->
-However, once I realized the
->
-issue was post-cpr modification of qxl memory, I switched my attention
->
-to the
->
-fix.
->
->
-> As for your patch, I can report that it doesn't resolve the issue as it
->
-> is.Â  But I was able to track down another possible memory corruption
->
-> using your approach with readonly mmap'ing:
->
->
->
->> Program terminated with signal SIGSEGV, Segmentation fault.
->
->> #0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
->
->> 412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
->
->> [Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
->
->> (gdb) bt
->
->> #0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
->
->> #1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
->
->> errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
->
->> #2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
->
->> errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
->
->> #3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
->
->> errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
->
->> #4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
->
->> value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
->
->> #5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
->
->> v=0x5638996f3770, name=0x56389759b141 "realized",
->
->> opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
->
->> Â Â Â Â  at ../qom/object.c:2374
->
->> #6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
->
->> name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
->
->> Â Â Â Â  at ../qom/object.c:1449
->
->> #7Â  0x00005638970f8586 in object_property_set_qobject
->
->> (obj=0x5638996e0e70, name=0x56389759b141 "realized",
->
->> value=0x5638996df900, errp=0x7ffd3c2b84e0)
->
->> Â Â Â Â  at ../qom/qom-qobject.c:28
->
->> #8Â  0x00005638970f3d8d in object_property_set_bool
->
->> (obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
->
->> errp=0x7ffd3c2b84e0)
->
->> Â Â Â Â  at ../qom/object.c:1519
->
->> #9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
->
->> bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
->
->> #10 0x0000563896dba675 in qdev_device_add_from_qdict
->
->> (opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
->
->> system/qdev-monitor.c:714
->
->> #11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
->
->> errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
->
->> #12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
->
->> opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
->
->> vl.c:1207
->
->> #13 0x000056389737a6cc in qemu_opts_foreach
->
->> Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
->
->> <device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
->
->> Â Â Â Â  at ../util/qemu-option.c:1135
->
->> #14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
->
->> vl.c:2745
->
->> #15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
->
->> <error_fatal>) at ../system/vl.c:2806
->
->> #16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
->
->> at ../system/vl.c:3838
->
->> #17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
->
->> system/main.c:72
->
->
->
-> So the attached adjusted version of your patch does seem to help.Â  At
->
-> least I can't reproduce the crash on my stand.
->
->
-Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
->
-are
->
-definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
->
-of init_qxl_ram that modify guest memory.
->
-Thanks, your v2 patch does seem to prevent the crash.  Would you re-send
-it to the list as a proper fix?
-
->
-> I'm wondering, could it be useful to explicitly mark all the reused
->
-> memory regions readonly upon cpr-transfer, and then make them writable
->
-> back again after the migration is done?Â  That way we will be segfaulting
->
-> early on instead of debugging tricky memory corruptions.
->
->
-It's a useful debugging technique, but changing protection on a large
->
-memory region
->
-can be too expensive for production due to TLB shootdowns.
->
->
-Also, there are cases where writes are performed but the value is
->
-guaranteed to
->
-be the same:
->
-Â  qxl_post_load()
->
-Â Â Â  qxl_set_mode()
->
-Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
->
-The value is the same because mode and shadow_rom.mode were passed in
->
-vmstate
->
-from old qemu.
->
-There're also cases where devices' ROM might be re-initialized.  E.g.
-this segfault occures upon further exploration of RO mapped RAM blocks:
-
->
-Program terminated with signal SIGSEGV, Segmentation fault.
->
-#0  __memmove_avx_unaligned_erms () at
->
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
->
-664             rep     movsb
->
-[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
->
-(gdb) bt
->
-#0  __memmove_avx_unaligned_erms () at
->
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
->
-#1  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380,
->
-owner=0x55aa2019ac10, name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
->
-at ../hw/core/loader.c:1032
->
-#2  0x000055aa1d031577 in rom_add_blob
->
-(name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072,
->
-max_len=2097152, addr=18446744073709551615, fw_file_name=0x55aa1da51f13
->
-"etc/acpi/tables", fw_callback=0x55aa1d441f59 <acpi_build_update>,
->
-callback_opaque=0x55aa20ff0010, as=0x0, read_only=true) at
->
-../hw/core/loader.c:1147
->
-#3  0x000055aa1cfd788d in acpi_add_rom_blob
->
-(update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010,
->
-blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at
->
-../hw/acpi/utils.c:46
->
-#4  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
->
-#5  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0)
->
-at ../hw/i386/pc.c:638
->
-#6  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10
->
-<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
->
-#7  0x000055aa1d039ee5 in qdev_machine_creation_done () at
->
-../hw/core/machine.c:1749
->
-#8  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40
->
-<error_fatal>) at ../system/vl.c:2779
->
-#9  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40
->
-<error_fatal>) at ../system/vl.c:2807
->
-#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at
->
-../system/vl.c:3838
->
-#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at
->
-../system/main.c:72
-I'm not sure whether ACPI tables ROM in particular is rewritten with the
-same content, but there might be cases where ROM can be read from file
-system upon initialization.  That is undesirable as guest kernel
-certainly won't be too happy about sudden change of the device's ROM
-content.
-
-So the issue we're dealing with here is any unwanted memory related
-device initialization upon cpr.
-
-For now the only thing that comes to my mind is to make a test where we
-put as many devices as we can into a VM, make ram blocks RO upon cpr
-(and remap them as RW later after migration is done, if needed), and
-catch any unwanted memory violations.  As Den suggested, we might
-consider adding that behaviour as a separate non-default option (or
-"migrate" command flag specific to cpr-transfer), which would only be
-used in the testing.
-
-Andrey
-
-On 3/6/25 16:16, Andrey Drobyshev wrote:
-On 3/5/25 11:19 PM, Steven Sistare wrote:
-On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
-On 3/4/25 9:05 PM, Steven Sistare wrote:
-On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:20 PM, Steven Sistare wrote:
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently
-and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
- Â Â Â Â Â Â  -machine q35 \
- Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
- Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
-dev/shm/
-ram0,share=on\
- Â Â Â Â Â Â  -machine memory-backend=ram0 \
- Â Â Â Â Â Â  -machine aux-ram-share=on \
- Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
- Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
- Â Â Â Â Â Â  -nographic \
- Â Â Â Â Â Â  -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
- Â Â Â Â Â Â  -machine q35 \
- Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
- Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
-dev/shm/
-ram0,share=on\
- Â Â Â Â Â Â  -machine memory-backend=ram0 \
- Â Â Â Â Â Â  -machine aux-ram-share=on \
- Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
- Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
- Â Â Â Â Â Â  -nographic \
- Â Â Â Â Â Â  -device qxl-vga \
- Â Â Â Â Â Â  -incoming tcp:0:44444 \
- Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
-"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
- Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
- Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
-{"transport":"socket","type":"inet","host":"0","port":"44444"}},
-{"channel-type":"cpr","addr":
-{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
-[Â Â  73.962002] [TTM] Buffer eviction failed
-[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
-0x00000001)
-[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
-allocate VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-
-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which
-speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
- Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
- Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
-VRAM
-BO")" != "" ]; then
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
- Â Â Â Â Â Â Â Â Â Â  fi
- Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
- Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
- Â Â Â Â Â Â Â Â Â Â  done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce
-that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't
-lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest
-memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.Â  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into
-this?Â  Any
-suggestions would be appreciated.Â  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-Also try adding this patch to see if it flags any ram blocks as not
-compatible with cpr.Â  A message is printed at migration start time.
- Â Â Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
-email-
-steven.sistare@oracle.com/
-
-- Steve
-With the traces enabled + the "migration: ram block cpr blockers"
-patch
-applied:
-
-Source:
-cpr_find_fd pc.bios, id 0 returns -1
-cpr_save_fd pc.bios, id 0, fd 22
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
-0x7fec18e00000
-cpr_find_fd pc.rom, id 0 returns -1
-cpr_save_fd pc.rom, id 0, fd 23
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
-0x7fec18c00000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
-cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 24 host 0x7fec18a00000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 25 host 0x7feb77e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 27 host 0x7fec18800000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 28 host 0x7feb73c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 34 host 0x7fec18600000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 35 host 0x7fec18200000
-cpr_find_fd /rom@etc/table-loader, id 0 returns -1
-cpr_save_fd /rom@etc/table-loader, id 0, fd 36
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 36 host 0x7feb8b600000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-37 host 0x7feb8b400000
-
-cpr_state_save cpr-transfer mode
-cpr_transfer_output /var/run/alma8cpr-dst.sock
-Target:
-cpr_transfer_input /var/run/alma8cpr-dst.sock
-cpr_state_load cpr-transfer mode
-cpr_find_fd pc.bios, id 0 returns 20
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
-0x7fcdc9800000
-cpr_find_fd pc.rom, id 0 returns 19
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
-0x7fcdc9600000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 18 host 0x7fcdc9400000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 17 host 0x7fcd27e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 16 host 0x7fcdc9200000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 15 host 0x7fcd23c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 14 host 0x7fcdc8800000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 13 host 0x7fcdc8400000
-cpr_find_fd /rom@etc/table-loader, id 0 returns 11
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 11 host 0x7fcdc8200000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-10 host 0x7fcd3be00000
-Looks like both vga.vram and qxl.vram are being preserved (with the
-same
-addresses), and no incompatible ram blocks are found during migration.
-Sorry, addressed are not the same, of course.Â  However corresponding
-ram
-blocks do seem to be preserved and initialized.
-So far, I have not reproduced the guest driver failure.
-
-However, I have isolated places where new QEMU improperly writes to
-the qxl memory regions prior to starting the guest, by mmap'ing them
-readonly after cpr:
-
- Â Â  qemu_ram_alloc_internal()
- Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
- Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
- Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
-
-I have attached a draft fix; try it and let me know.
-My console window looks fine before and after cpr, using
--vnc $hostip:0 -vga qxl
-
-- Steve
-Regarding the reproduce: when I launch the buggy version with the same
-options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
-my VNC client silently hangs on the target after a while.Â  Could it
-happen on your stand as well?
-cpr does not preserve the vnc connection and session.Â  To test, I specify
-port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
-dormant the dest vnc becomes active.
-Sure, I meant that VNC on the dest (on the port 1) works for a while
-after the migration and then hangs, apparently after the guest QXL crash.
-Could you try launching VM with
-"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
-directly in the shell, so when qxl driver crashes you're still able to
-inspect the kernel messages.
-I have been running like that, but have not reproduced the qxl driver
-crash,
-and I suspect my guest image+kernel is too old.
-Yes, that's probably the case.  But the crash occurs on my Fedora 41
-guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
-be buggy.
-However, once I realized the
-issue was post-cpr modification of qxl memory, I switched my attention
-to the
-fix.
-As for your patch, I can report that it doesn't resolve the issue as it
-is.Â  But I was able to track down another possible memory corruption
-using your approach with readonly mmap'ing:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
-[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
-(gdb) bt
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
-errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
-#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
-errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
-#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
-errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
-#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
-value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
-#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
-v=0x5638996f3770, name=0x56389759b141 "realized",
-opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
- Â Â Â Â  at ../qom/object.c:2374
-#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
-name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
- Â Â Â Â  at ../qom/object.c:1449
-#7Â  0x00005638970f8586 in object_property_set_qobject
-(obj=0x5638996e0e70, name=0x56389759b141 "realized",
-value=0x5638996df900, errp=0x7ffd3c2b84e0)
- Â Â Â Â  at ../qom/qom-qobject.c:28
-#8Â  0x00005638970f3d8d in object_property_set_bool
-(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
-errp=0x7ffd3c2b84e0)
- Â Â Â Â  at ../qom/object.c:1519
-#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
-bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
-#10 0x0000563896dba675 in qdev_device_add_from_qdict
-(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
-system/qdev-monitor.c:714
-#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
-errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
-#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
-opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
-vl.c:1207
-#13 0x000056389737a6cc in qemu_opts_foreach
- Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
-<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
- Â Â Â Â  at ../util/qemu-option.c:1135
-#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
-vl.c:2745
-#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
-<error_fatal>) at ../system/vl.c:2806
-#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
-at ../system/vl.c:3838
-#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
-system/main.c:72
-So the attached adjusted version of your patch does seem to help.Â  At
-least I can't reproduce the crash on my stand.
-Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
-are
-definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
-of init_qxl_ram that modify guest memory.
-Thanks, your v2 patch does seem to prevent the crash.  Would you re-send
-it to the list as a proper fix?
-I'm wondering, could it be useful to explicitly mark all the reused
-memory regions readonly upon cpr-transfer, and then make them writable
-back again after the migration is done?Â  That way we will be segfaulting
-early on instead of debugging tricky memory corruptions.
-It's a useful debugging technique, but changing protection on a large
-memory region
-can be too expensive for production due to TLB shootdowns.
-
-Also, there are cases where writes are performed but the value is
-guaranteed to
-be the same:
- Â  qxl_post_load()
- Â Â Â  qxl_set_mode()
- Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
-The value is the same because mode and shadow_rom.mode were passed in
-vmstate
-from old qemu.
-There're also cases where devices' ROM might be re-initialized.  E.g.
-this segfault occures upon further exploration of RO mapped RAM blocks:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0  __memmove_avx_unaligned_erms () at 
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
-664             rep     movsb
-[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
-(gdb) bt
-#0  __memmove_avx_unaligned_erms () at 
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
-#1  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
-name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
-     at ../hw/core/loader.c:1032
-#2  0x000055aa1d031577 in rom_add_blob
-     (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
-addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
-fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
-read_only=true) at ../hw/core/loader.c:1147
-#3  0x000055aa1cfd788d in acpi_add_rom_blob
-     (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
-blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
-#4  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
-#5  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
-at ../hw/i386/pc.c:638
-#6  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
-<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
-#7  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
-../hw/core/machine.c:1749
-#8  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
-<error_fatal>) at ../system/vl.c:2779
-#9  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
-<error_fatal>) at ../system/vl.c:2807
-#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
-../system/vl.c:3838
-#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
-../system/main.c:72
-I'm not sure whether ACPI tables ROM in particular is rewritten with the
-same content, but there might be cases where ROM can be read from file
-system upon initialization.  That is undesirable as guest kernel
-certainly won't be too happy about sudden change of the device's ROM
-content.
-
-So the issue we're dealing with here is any unwanted memory related
-device initialization upon cpr.
-
-For now the only thing that comes to my mind is to make a test where we
-put as many devices as we can into a VM, make ram blocks RO upon cpr
-(and remap them as RW later after migration is done, if needed), and
-catch any unwanted memory violations.  As Den suggested, we might
-consider adding that behaviour as a separate non-default option (or
-"migrate" command flag specific to cpr-transfer), which would only be
-used in the testing.
-
-Andrey
-No way. ACPI with the source must be used in the same way as BIOSes
-and optional ROMs.
-
-Den
-
-On 3/6/2025 10:52 AM, Denis V. Lunev wrote:
-On 3/6/25 16:16, Andrey Drobyshev wrote:
-On 3/5/25 11:19 PM, Steven Sistare wrote:
-On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
-On 3/4/25 9:05 PM, Steven Sistare wrote:
-On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:20 PM, Steven Sistare wrote:
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently
-and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
-Â Â Â Â Â Â Â  -machine q35 \
-Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
-dev/shm/
-ram0,share=on\
-Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â Â Â Â  -nographic \
-Â Â Â Â Â Â Â  -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
-Â Â Â Â Â Â Â  -machine q35 \
-Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
-dev/shm/
-ram0,share=on\
-Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â Â Â Â  -nographic \
-Â Â Â Â Â Â Â  -device qxl-vga \
-Â Â Â Â Â Â Â  -incoming tcp:0:44444 \
-Â Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
-"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
-Â Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
-Â Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
-{"transport":"socket","type":"inet","host":"0","port":"44444"}},
-{"channel-type":"cpr","addr":
-{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
-[Â Â  73.962002] [TTM] Buffer eviction failed
-[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
-0x00000001)
-[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
-allocate VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-
-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which
-speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
-Â Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
-Â Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
-VRAM
-BO")" != "" ]; then
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
-Â Â Â Â Â Â Â Â Â Â Â  fi
-Â Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
-Â Â Â Â Â Â Â Â Â Â Â  done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce
-that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't
-lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest
-memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.Â  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into
-this?Â  Any
-suggestions would be appreciated.Â  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-Also try adding this patch to see if it flags any ram blocks as not
-compatible with cpr.Â  A message is printed at migration start time.
-Â Â Â Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
-email-
-steven.sistare@oracle.com/
-
-- Steve
-With the traces enabled + the "migration: ram block cpr blockers"
-patch
-applied:
-
-Source:
-cpr_find_fd pc.bios, id 0 returns -1
-cpr_save_fd pc.bios, id 0, fd 22
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
-0x7fec18e00000
-cpr_find_fd pc.rom, id 0 returns -1
-cpr_save_fd pc.rom, id 0, fd 23
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
-0x7fec18c00000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
-cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 24 host 0x7fec18a00000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 25 host 0x7feb77e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 27 host 0x7fec18800000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 28 host 0x7feb73c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 34 host 0x7fec18600000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 35 host 0x7fec18200000
-cpr_find_fd /rom@etc/table-loader, id 0 returns -1
-cpr_save_fd /rom@etc/table-loader, id 0, fd 36
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 36 host 0x7feb8b600000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-37 host 0x7feb8b400000
-
-cpr_state_save cpr-transfer mode
-cpr_transfer_output /var/run/alma8cpr-dst.sock
-Target:
-cpr_transfer_input /var/run/alma8cpr-dst.sock
-cpr_state_load cpr-transfer mode
-cpr_find_fd pc.bios, id 0 returns 20
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
-0x7fcdc9800000
-cpr_find_fd pc.rom, id 0 returns 19
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
-0x7fcdc9600000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 18 host 0x7fcdc9400000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 17 host 0x7fcd27e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 16 host 0x7fcdc9200000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 15 host 0x7fcd23c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 14 host 0x7fcdc8800000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 13 host 0x7fcdc8400000
-cpr_find_fd /rom@etc/table-loader, id 0 returns 11
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 11 host 0x7fcdc8200000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-10 host 0x7fcd3be00000
-Looks like both vga.vram and qxl.vram are being preserved (with the
-same
-addresses), and no incompatible ram blocks are found during migration.
-Sorry, addressed are not the same, of course.Â  However corresponding
-ram
-blocks do seem to be preserved and initialized.
-So far, I have not reproduced the guest driver failure.
-
-However, I have isolated places where new QEMU improperly writes to
-the qxl memory regions prior to starting the guest, by mmap'ing them
-readonly after cpr:
-
-Â Â Â  qemu_ram_alloc_internal()
-Â Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
-Â Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
-Â Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
-
-I have attached a draft fix; try it and let me know.
-My console window looks fine before and after cpr, using
--vnc $hostip:0 -vga qxl
-
-- Steve
-Regarding the reproduce: when I launch the buggy version with the same
-options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
-my VNC client silently hangs on the target after a while.Â  Could it
-happen on your stand as well?
-cpr does not preserve the vnc connection and session.Â  To test, I specify
-port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
-dormant the dest vnc becomes active.
-Sure, I meant that VNC on the dest (on the port 1) works for a while
-after the migration and then hangs, apparently after the guest QXL crash.
-Could you try launching VM with
-"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
-directly in the shell, so when qxl driver crashes you're still able to
-inspect the kernel messages.
-I have been running like that, but have not reproduced the qxl driver
-crash,
-and I suspect my guest image+kernel is too old.
-Yes, that's probably the case.Â  But the crash occurs on my Fedora 41
-guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
-be buggy.
-However, once I realized the
-issue was post-cpr modification of qxl memory, I switched my attention
-to the
-fix.
-As for your patch, I can report that it doesn't resolve the issue as it
-is.Â  But I was able to track down another possible memory corruption
-using your approach with readonly mmap'ing:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
-[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
-(gdb) bt
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
-errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
-#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
-errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
-#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
-errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
-#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
-value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
-#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
-v=0x5638996f3770, name=0x56389759b141 "realized",
-opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/object.c:2374
-#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
-name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/object.c:1449
-#7Â  0x00005638970f8586 in object_property_set_qobject
-(obj=0x5638996e0e70, name=0x56389759b141 "realized",
-value=0x5638996df900, errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/qom-qobject.c:28
-#8Â  0x00005638970f3d8d in object_property_set_bool
-(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
-errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/object.c:1519
-#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
-bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
-#10 0x0000563896dba675 in qdev_device_add_from_qdict
-(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
-system/qdev-monitor.c:714
-#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
-errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
-#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
-opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
-vl.c:1207
-#13 0x000056389737a6cc in qemu_opts_foreach
-Â Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
-<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
-Â Â Â Â Â  at ../util/qemu-option.c:1135
-#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
-vl.c:2745
-#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
-<error_fatal>) at ../system/vl.c:2806
-#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
-at ../system/vl.c:3838
-#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
-system/main.c:72
-So the attached adjusted version of your patch does seem to help.Â  At
-least I can't reproduce the crash on my stand.
-Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
-are
-definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
-of init_qxl_ram that modify guest memory.
-Thanks, your v2 patch does seem to prevent the crash.Â  Would you re-send
-it to the list as a proper fix?
-Yes.  Was waiting for your confirmation.
-I'm wondering, could it be useful to explicitly mark all the reused
-memory regions readonly upon cpr-transfer, and then make them writable
-back again after the migration is done?Â  That way we will be segfaulting
-early on instead of debugging tricky memory corruptions.
-It's a useful debugging technique, but changing protection on a large
-memory region
-can be too expensive for production due to TLB shootdowns.
-
-Also, there are cases where writes are performed but the value is
-guaranteed to
-be the same:
-Â Â  qxl_post_load()
-Â Â Â Â  qxl_set_mode()
-Â Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
-The value is the same because mode and shadow_rom.mode were passed in
-vmstate
-from old qemu.
-There're also cases where devices' ROM might be re-initialized.Â  E.g.
-this segfault occures upon further exploration of RO mapped RAM blocks:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0Â  __memmove_avx_unaligned_erms () at 
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
-664Â Â Â Â Â Â Â Â Â Â Â Â  repÂ Â Â Â  movsb
-[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
-(gdb) bt
-#0Â  __memmove_avx_unaligned_erms () at 
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
-#1Â  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
-name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
-Â Â Â Â  at ../hw/core/loader.c:1032
-#2Â  0x000055aa1d031577 in rom_add_blob
-Â Â Â Â  (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
-addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
-fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
-read_only=true) at ../hw/core/loader.c:1147
-#3Â  0x000055aa1cfd788d in acpi_add_rom_blob
-Â Â Â Â  (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
-blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
-#4Â  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
-#5Â  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
-at ../hw/i386/pc.c:638
-#6Â  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
-<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
-#7Â  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
-../hw/core/machine.c:1749
-#8Â  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
-<error_fatal>) at ../system/vl.c:2779
-#9Â  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
-<error_fatal>) at ../system/vl.c:2807
-#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
-../system/vl.c:3838
-#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
-../system/main.c:72
-I'm not sure whether ACPI tables ROM in particular is rewritten with the
-same content, but there might be cases where ROM can be read from file
-system upon initialization.Â  That is undesirable as guest kernel
-certainly won't be too happy about sudden change of the device's ROM
-content.
-
-So the issue we're dealing with here is any unwanted memory related
-device initialization upon cpr.
-
-For now the only thing that comes to my mind is to make a test where we
-put as many devices as we can into a VM, make ram blocks RO upon cpr
-(and remap them as RW later after migration is done, if needed), and
-catch any unwanted memory violations.Â  As Den suggested, we might
-consider adding that behaviour as a separate non-default option (or
-"migrate" command flag specific to cpr-transfer), which would only be
-used in the testing.
-I'll look into adding an option, but there may be too many false positives,
-such as the qxl_set_mode case above.  And the maintainers may object to me
-eliminating the false positives by adding more CPR_IN tests, due to gratuitous
-(from their POV) ugliness.
-
-But I will use the technique to look for more write violations.
-Andrey
-No way. ACPI with the source must be used in the same way as BIOSes
-and optional ROMs.
-Yup, its a bug.  Will fix.
-
-- Steve
-
-see
-1741380954-341079-1-git-send-email-steven.sistare@oracle.com
-/">https://lore.kernel.org/qemu-devel/
-1741380954-341079-1-git-send-email-steven.sistare@oracle.com
-/
-- Steve
-
-On 3/6/2025 11:13 AM, Steven Sistare wrote:
-On 3/6/2025 10:52 AM, Denis V. Lunev wrote:
-On 3/6/25 16:16, Andrey Drobyshev wrote:
-On 3/5/25 11:19 PM, Steven Sistare wrote:
-On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
-On 3/4/25 9:05 PM, Steven Sistare wrote:
-On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
-On 2/28/25 8:20 PM, Steven Sistare wrote:
-On 2/28/2025 1:13 PM, Steven Sistare wrote:
-On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
-Hi all,
-
-We've been experimenting with cpr-transfer migration mode recently
-and
-have discovered the following issue with the guest QXL driver:
-
-Run migration source:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$EMULATOR -enable-kvm \
-Â Â Â Â Â Â Â  -machine q35 \
-Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
-dev/shm/
-ram0,share=on\
-Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â Â Â Â  -nographic \
-Â Â Â Â Â Â Â  -device qxl-vga
-Run migration target:
-EMULATOR=/path/to/emulator
-ROOTFS=/path/to/image
-QMPSOCK=/var/run/alma8qmp-dst.sock
-$EMULATOR -enable-kvm \
-Â Â Â Â Â Â Â  -machine q35 \
-Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
-Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
-dev/shm/
-ram0,share=on\
-Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
-Â Â Â Â Â Â Â  -machine aux-ram-share=on \
-Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
-Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
-Â Â Â Â Â Â Â  -nographic \
-Â Â Â Â Â Â Â  -device qxl-vga \
-Â Â Â Â Â Â Â  -incoming tcp:0:44444 \
-Â Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
-"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
-Launch the migration:
-QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
-QMPSOCK=/var/run/alma8qmp-src.sock
-
-$QMPSHELL -p $QMPSOCK <<EOF
-Â Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
-Â Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
-{"transport":"socket","type":"inet","host":"0","port":"44444"}},
-{"channel-type":"cpr","addr":
-{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
-dst.sock"}}]
-EOF
-Then, after a while, QXL guest driver on target crashes spewing the
-following messages:
-[Â Â  73.962002] [TTM] Buffer eviction failed
-[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
-0x00000001)
-[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
-allocate VRAM BO
-That seems to be a known kernel QXL driver bug:
-https://lore.kernel.org/all/20220907094423.93581-1-
-min_halo@163.com/T/
-https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
-(the latter discussion contains that reproduce script which
-speeds up
-the crash in the guest):
-#!/bin/bash
-
-chvt 3
-
-for j in $(seq 80); do
-Â Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
-Â Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
-VRAM
-BO")" != "" ]; then
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
-Â Â Â Â Â Â Â Â Â Â Â  fi
-Â Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
-Â Â Â Â Â Â Â Â Â Â Â  done
-done
-
-echo "bug could not be reproduced"
-exit 0
-The bug itself seems to remain unfixed, as I was able to reproduce
-that
-with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
-cpr-transfer code also seems to be buggy as it triggers the crash -
-without the cpr-transfer migration the above reproduce doesn't
-lead to
-crash on the source VM.
-
-I suspect that, as cpr-transfer doesn't migrate the guest
-memory, but
-rather passes it through the memory backend object, our code might
-somehow corrupt the VRAM.Â  However, I wasn't able to trace the
-corruption so far.
-
-Could somebody help the investigation and take a look into
-this?Â  Any
-suggestions would be appreciated.Â  Thanks!
-Possibly some memory region created by qxl is not being preserved.
-Try adding these traces to see what is preserved:
-
--trace enable='*cpr*'
--trace enable='*ram_alloc*'
-Also try adding this patch to see if it flags any ram blocks as not
-compatible with cpr.Â  A message is printed at migration start time.
-Â Â Â Â
-https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
-email-
-steven.sistare@oracle.com/
-
-- Steve
-With the traces enabled + the "migration: ram block cpr blockers"
-patch
-applied:
-
-Source:
-cpr_find_fd pc.bios, id 0 returns -1
-cpr_save_fd pc.bios, id 0, fd 22
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
-0x7fec18e00000
-cpr_find_fd pc.rom, id 0 returns -1
-cpr_save_fd pc.rom, id 0, fd 23
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
-0x7fec18c00000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
-cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 24 host 0x7fec18a00000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 25 host 0x7feb77e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 27 host 0x7fec18800000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 28 host 0x7feb73c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
-cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 34 host 0x7fec18600000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 35 host 0x7fec18200000
-cpr_find_fd /rom@etc/table-loader, id 0 returns -1
-cpr_save_fd /rom@etc/table-loader, id 0, fd 36
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 36 host 0x7feb8b600000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
-cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-37 host 0x7feb8b400000
-
-cpr_state_save cpr-transfer mode
-cpr_transfer_output /var/run/alma8cpr-dst.sock
-Target:
-cpr_transfer_input /var/run/alma8cpr-dst.sock
-cpr_state_load cpr-transfer mode
-cpr_find_fd pc.bios, id 0 returns 20
-qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
-0x7fcdc9800000
-cpr_find_fd pc.rom, id 0 returns 19
-qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
-0x7fcdc9600000
-cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
-qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
-262144 fd 18 host 0x7fcdc9400000
-cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
-qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
-67108864 fd 17 host 0x7fcd27e00000
-cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
-fd 16 host 0x7fcdc9200000
-cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
-qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
-67108864 fd 15 host 0x7fcd23c00000
-cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
-qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
-fd 14 host 0x7fcdc8800000
-cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
-qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
-2097152 fd 13 host 0x7fcdc8400000
-cpr_find_fd /rom@etc/table-loader, id 0 returns 11
-qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
-fd 11 host 0x7fcdc8200000
-cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
-qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
-10 host 0x7fcd3be00000
-Looks like both vga.vram and qxl.vram are being preserved (with the
-same
-addresses), and no incompatible ram blocks are found during migration.
-Sorry, addressed are not the same, of course.Â  However corresponding
-ram
-blocks do seem to be preserved and initialized.
-So far, I have not reproduced the guest driver failure.
-
-However, I have isolated places where new QEMU improperly writes to
-the qxl memory regions prior to starting the guest, by mmap'ing them
-readonly after cpr:
-
-Â Â Â  qemu_ram_alloc_internal()
-Â Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
-Â Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
-Â Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
-
-I have attached a draft fix; try it and let me know.
-My console window looks fine before and after cpr, using
--vnc $hostip:0 -vga qxl
-
-- Steve
-Regarding the reproduce: when I launch the buggy version with the same
-options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
-my VNC client silently hangs on the target after a while.Â  Could it
-happen on your stand as well?
-cpr does not preserve the vnc connection and session.Â  To test, I specify
-port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
-dormant the dest vnc becomes active.
-Sure, I meant that VNC on the dest (on the port 1) works for a while
-after the migration and then hangs, apparently after the guest QXL crash.
-Could you try launching VM with
-"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
-directly in the shell, so when qxl driver crashes you're still able to
-inspect the kernel messages.
-I have been running like that, but have not reproduced the qxl driver
-crash,
-and I suspect my guest image+kernel is too old.
-Yes, that's probably the case.Â  But the crash occurs on my Fedora 41
-guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
-be buggy.
-However, once I realized the
-issue was post-cpr modification of qxl memory, I switched my attention
-to the
-fix.
-As for your patch, I can report that it doesn't resolve the issue as it
-is.Â  But I was able to track down another possible memory corruption
-using your approach with readonly mmap'ing:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
-[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
-(gdb) bt
-#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
-#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
-errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
-#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
-errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
-#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
-errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
-#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
-value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
-#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
-v=0x5638996f3770, name=0x56389759b141 "realized",
-opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/object.c:2374
-#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
-name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/object.c:1449
-#7Â  0x00005638970f8586 in object_property_set_qobject
-(obj=0x5638996e0e70, name=0x56389759b141 "realized",
-value=0x5638996df900, errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/qom-qobject.c:28
-#8Â  0x00005638970f3d8d in object_property_set_bool
-(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
-errp=0x7ffd3c2b84e0)
-Â Â Â Â Â  at ../qom/object.c:1519
-#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
-bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
-#10 0x0000563896dba675 in qdev_device_add_from_qdict
-(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
-system/qdev-monitor.c:714
-#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
-errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
-#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
-opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
-vl.c:1207
-#13 0x000056389737a6cc in qemu_opts_foreach
-Â Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
-<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
-Â Â Â Â Â  at ../util/qemu-option.c:1135
-#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
-vl.c:2745
-#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
-<error_fatal>) at ../system/vl.c:2806
-#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
-at ../system/vl.c:3838
-#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
-system/main.c:72
-So the attached adjusted version of your patch does seem to help.Â  At
-least I can't reproduce the crash on my stand.
-Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
-are
-definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
-of init_qxl_ram that modify guest memory.
-Thanks, your v2 patch does seem to prevent the crash.Â  Would you re-send
-it to the list as a proper fix?
-Yes.Â  Was waiting for your confirmation.
-I'm wondering, could it be useful to explicitly mark all the reused
-memory regions readonly upon cpr-transfer, and then make them writable
-back again after the migration is done?Â  That way we will be segfaulting
-early on instead of debugging tricky memory corruptions.
-It's a useful debugging technique, but changing protection on a large
-memory region
-can be too expensive for production due to TLB shootdowns.
-
-Also, there are cases where writes are performed but the value is
-guaranteed to
-be the same:
-Â Â  qxl_post_load()
-Â Â Â Â  qxl_set_mode()
-Â Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
-The value is the same because mode and shadow_rom.mode were passed in
-vmstate
-from old qemu.
-There're also cases where devices' ROM might be re-initialized.Â  E.g.
-this segfault occures upon further exploration of RO mapped RAM blocks:
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0Â  __memmove_avx_unaligned_erms () at 
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
-664Â Â Â Â Â Â Â Â Â Â Â Â  repÂ Â Â Â  movsb
-[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
-(gdb) bt
-#0Â  __memmove_avx_unaligned_erms () at 
-../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
-#1Â  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
-name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
-Â Â Â Â  at ../hw/core/loader.c:1032
-#2Â  0x000055aa1d031577 in rom_add_blob
-Â Â Â Â  (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
-addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
-fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
-read_only=true) at ../hw/core/loader.c:1147
-#3Â  0x000055aa1cfd788d in acpi_add_rom_blob
-Â Â Â Â  (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
-blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
-#4Â  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
-#5Â  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
-at ../hw/i386/pc.c:638
-#6Â  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
-<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
-#7Â  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
-../hw/core/machine.c:1749
-#8Â  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
-<error_fatal>) at ../system/vl.c:2779
-#9Â  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
-<error_fatal>) at ../system/vl.c:2807
-#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
-../system/vl.c:3838
-#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
-../system/main.c:72
-I'm not sure whether ACPI tables ROM in particular is rewritten with the
-same content, but there might be cases where ROM can be read from file
-system upon initialization.Â  That is undesirable as guest kernel
-certainly won't be too happy about sudden change of the device's ROM
-content.
-
-So the issue we're dealing with here is any unwanted memory related
-device initialization upon cpr.
-
-For now the only thing that comes to my mind is to make a test where we
-put as many devices as we can into a VM, make ram blocks RO upon cpr
-(and remap them as RW later after migration is done, if needed), and
-catch any unwanted memory violations.Â  As Den suggested, we might
-consider adding that behaviour as a separate non-default option (or
-"migrate" command flag specific to cpr-transfer), which would only be
-used in the testing.
-I'll look into adding an option, but there may be too many false positives,
-such as the qxl_set_mode case above.Â  And the maintainers may object to me
-eliminating the false positives by adding more CPR_IN tests, due to gratuitous
-(from their POV) ugliness.
-
-But I will use the technique to look for more write violations.
-Andrey
-No way. ACPI with the source must be used in the same way as BIOSes
-and optional ROMs.
-Yup, its a bug.Â  Will fix.
-
-- Steve
-
diff --git a/classification_output/01/mistranslation/6178292 b/classification_output/01/mistranslation/6178292
deleted file mode 100644
index f13db3b8..00000000
--- a/classification_output/01/mistranslation/6178292
+++ /dev/null
@@ -1,258 +0,0 @@
-mistranslation: 0.930
-semantic: 0.928
-instruction: 0.905
-other: 0.890
-
-[BUG][RFC] CPR transfer Issues: Socket permissions and PID files
-
-Hello,
-
-While testing CPR transfer I encountered two issues. The first is that the 
-transfer fails when running with pidfiles due to the destination qemu process 
-attempting to create the pidfile while it is still locked by the source 
-process. The second is that the transfer fails when running with the -run-with 
-user=$USERID parameter. This is because the destination qemu process creates 
-the UNIX sockets used for the CPR transfer before dropping to the lower 
-permissioned user, which causes them to be owned by the original user. The 
-source qemu process then does not have permission to connect to it because it 
-is already running as the lesser permissioned user.
-
-Reproducing the first issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have the -pidfile parameter passed on the command line. You 
-should see the following error on the command line of the second process:
-
-qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
-temporarily unavailable
-
-Reproducing the second issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have -run-with user=$USERID passed on the command line, where 
-$USERID is a different user from the one launching the processes. Then attempt 
-a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
-receive the following error via QMP:
-{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
-Permission denied"}}
-
-I provided a minimal patch that works around the second issue.
-
-Thank you,
-Ben Chaney
-
----
-include/system/os-posix.h | 4 ++++
-os-posix.c | 8 --------
-util/qemu-sockets.c | 21 +++++++++++++++++++++
-3 files changed, 25 insertions(+), 8 deletions(-)
-
-diff --git a/include/system/os-posix.h b/include/system/os-posix.h
-index ce5b3bccf8..2a414a914a 100644
---- a/include/system/os-posix.h
-+++ b/include/system/os-posix.h
-@@ -55,6 +55,10 @@ void os_setup_limits(void);
-void os_setup_post(void);
-int os_mlock(bool on_fault);
-
-+extern struct passwd *user_pwd;
-+extern uid_t user_uid;
-+extern gid_t user_gid;
-+
-/**
-* qemu_alloc_stack:
-* @sz: pointer to a size_t holding the requested usable stack size
-diff --git a/os-posix.c b/os-posix.c
-index 52925c23d3..9369b312a0 100644
---- a/os-posix.c
-+++ b/os-posix.c
-@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
-}
-
-
--/*
-- * Must set all three of these at once.
-- * Legal combinations are unset by name by uid
-- */
--static struct passwd *user_pwd; /* NULL non-NULL NULL */
--static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
--static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
--
-/*
-* Prepare to change user ID. user_id can be one of 3 forms:
-* - a username, in which case user ID will be changed to its uid,
-diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
-index 77477c1cd5..987977ead9 100644
---- a/util/qemu-sockets.c
-+++ b/util/qemu-sockets.c
-@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
-#endif
-}
-
-+/*
-+ * Must set all three of these at once.
-+ * Legal combinations are unset by name by uid
-+ */
-+struct passwd *user_pwd; /* NULL non-NULL NULL */
-+uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
-+gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
-+
-static int unix_listen_saddr(UnixSocketAddress *saddr,
-int num,
-Error **errp)
-@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
-error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
-goto err;
-}
-+ if (user_pwd) {
-+ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+ else if (user_uid != -1 && user_gid != -1) {
-+ if (chown(un.sun_path, user_uid, user_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+
-if (listen(sock, num) < 0) {
-error_setg_errno(errp, errno, "Failed to listen on socket");
-goto err;
---
-2.40.1
-
-Thank you Ben.  I appreciate you testing CPR and shaking out the bugs.
-I will study these and propose patches.
-
-My initial reaction to the pidfile issue is that the orchestration layer must
-pass a different filename when starting the destination qemu instance.  When
-using live update without containers, these types of resource conflicts in the
-global namespaces are a known issue.
-
-- Steve
-
-On 3/14/2025 2:33 PM, Chaney, Ben wrote:
-Hello,
-
-While testing CPR transfer I encountered two issues. The first is that the 
-transfer fails when running with pidfiles due to the destination qemu process 
-attempting to create the pidfile while it is still locked by the source 
-process. The second is that the transfer fails when running with the -run-with 
-user=$USERID parameter. This is because the destination qemu process creates 
-the UNIX sockets used for the CPR transfer before dropping to the lower 
-permissioned user, which causes them to be owned by the original user. The 
-source qemu process then does not have permission to connect to it because it 
-is already running as the lesser permissioned user.
-
-Reproducing the first issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have the -pidfile parameter passed on the command line. You 
-should see the following error on the command line of the second process:
-
-qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
-temporarily unavailable
-
-Reproducing the second issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have -run-with user=$USERID passed on the command line, where 
-$USERID is a different user from the one launching the processes. Then attempt 
-a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
-receive the following error via QMP:
-{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
-Permission denied"}}
-
-I provided a minimal patch that works around the second issue.
-
-Thank you,
-Ben Chaney
-
----
-include/system/os-posix.h | 4 ++++
-os-posix.c | 8 --------
-util/qemu-sockets.c | 21 +++++++++++++++++++++
-3 files changed, 25 insertions(+), 8 deletions(-)
-
-diff --git a/include/system/os-posix.h b/include/system/os-posix.h
-index ce5b3bccf8..2a414a914a 100644
---- a/include/system/os-posix.h
-+++ b/include/system/os-posix.h
-@@ -55,6 +55,10 @@ void os_setup_limits(void);
-void os_setup_post(void);
-int os_mlock(bool on_fault);
-
-+extern struct passwd *user_pwd;
-+extern uid_t user_uid;
-+extern gid_t user_gid;
-+
-/**
-* qemu_alloc_stack:
-* @sz: pointer to a size_t holding the requested usable stack size
-diff --git a/os-posix.c b/os-posix.c
-index 52925c23d3..9369b312a0 100644
---- a/os-posix.c
-+++ b/os-posix.c
-@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
-}
-
-
--/*
-- * Must set all three of these at once.
-- * Legal combinations are unset by name by uid
-- */
--static struct passwd *user_pwd; /* NULL non-NULL NULL */
--static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
--static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
--
-/*
-* Prepare to change user ID. user_id can be one of 3 forms:
-* - a username, in which case user ID will be changed to its uid,
-diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
-index 77477c1cd5..987977ead9 100644
---- a/util/qemu-sockets.c
-+++ b/util/qemu-sockets.c
-@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
-#endif
-}
-
-+/*
-+ * Must set all three of these at once.
-+ * Legal combinations are unset by name by uid
-+ */
-+struct passwd *user_pwd; /* NULL non-NULL NULL */
-+uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
-+gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
-+
-static int unix_listen_saddr(UnixSocketAddress *saddr,
-int num,
-Error **errp)
-@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
-error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
-goto err;
-}
-+ if (user_pwd) {
-+ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+ else if (user_uid != -1 && user_gid != -1) {
-+ if (chown(un.sun_path, user_uid, user_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+
-if (listen(sock, num) < 0) {
-error_setg_errno(errp, errno, "Failed to listen on socket");
-goto err;
---
-2.40.1
-
diff --git a/classification_output/01/mistranslation/64322995 b/classification_output/01/mistranslation/64322995
new file mode 100644
index 00000000..2f16ce87
--- /dev/null
+++ b/classification_output/01/mistranslation/64322995
@@ -0,0 +1,54 @@
+mistranslation: 0.936
+semantic: 0.906
+other: 0.881
+instruction: 0.864
+
+[Qemu-devel] [BUG] trace: QEMU hangs on initialization with the	"simple" backend
+
+While starting the softmmu version of QEMU, the simple backend waits for the
+writeout thread to signal a condition variable when initializing the output file
+path. But since the writeout thread has not been created, it just waits forever.
+
+Thanks,
+  Lluis
+
+On Tue, Feb 09, 2016 at 09:24:04PM +0100, LluÃ­s Vilanova wrote:
+>
+While starting the softmmu version of QEMU, the simple backend waits for the
+>
+writeout thread to signal a condition variable when initializing the output
+>
+file
+>
+path. But since the writeout thread has not been created, it just waits
+>
+forever.
+Denis Lunev posted a fix:
+https://patchwork.ozlabs.org/patch/580968/
+Stefan
+signature.asc
+Description:
+PGP signature
+
+Stefan Hajnoczi writes:
+
+>
+On Tue, Feb 09, 2016 at 09:24:04PM +0100, LluÃ­s Vilanova wrote:
+>
+> While starting the softmmu version of QEMU, the simple backend waits for the
+>
+> writeout thread to signal a condition variable when initializing the output
+>
+> file
+>
+> path. But since the writeout thread has not been created, it just waits
+>
+> forever.
+>
+Denis Lunev posted a fix:
+>
+https://patchwork.ozlabs.org/patch/580968/
+Great, thanks.
+
+Lluis
+
diff --git a/classification_output/01/mistranslation/6866700 b/classification_output/01/mistranslation/6866700
deleted file mode 100644
index 2f16ce87..00000000
--- a/classification_output/01/mistranslation/6866700
+++ /dev/null
@@ -1,54 +0,0 @@
-mistranslation: 0.936
-semantic: 0.906
-other: 0.881
-instruction: 0.864
-
-[Qemu-devel] [BUG] trace: QEMU hangs on initialization with the	"simple" backend
-
-While starting the softmmu version of QEMU, the simple backend waits for the
-writeout thread to signal a condition variable when initializing the output file
-path. But since the writeout thread has not been created, it just waits forever.
-
-Thanks,
-  Lluis
-
-On Tue, Feb 09, 2016 at 09:24:04PM +0100, LluÃ­s Vilanova wrote:
->
-While starting the softmmu version of QEMU, the simple backend waits for the
->
-writeout thread to signal a condition variable when initializing the output
->
-file
->
-path. But since the writeout thread has not been created, it just waits
->
-forever.
-Denis Lunev posted a fix:
-https://patchwork.ozlabs.org/patch/580968/
-Stefan
-signature.asc
-Description:
-PGP signature
-
-Stefan Hajnoczi writes:
-
->
-On Tue, Feb 09, 2016 at 09:24:04PM +0100, LluÃ­s Vilanova wrote:
->
-> While starting the softmmu version of QEMU, the simple backend waits for the
->
-> writeout thread to signal a condition variable when initializing the output
->
-> file
->
-> path. But since the writeout thread has not been created, it just waits
->
-> forever.
->
-Denis Lunev posted a fix:
->
-https://patchwork.ozlabs.org/patch/580968/
-Great, thanks.
-
-Lluis
-
diff --git a/classification_output/01/mistranslation/70294255 b/classification_output/01/mistranslation/70294255
new file mode 100644
index 00000000..67353acd
--- /dev/null
+++ b/classification_output/01/mistranslation/70294255
@@ -0,0 +1,1061 @@
+mistranslation: 0.862
+semantic: 0.858
+instruction: 0.856
+other: 0.852
+
+[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang
+
+hi:
+
+yes.it is better.
+
+And should we delete 
+
+
+
+
+#ifdef WIN32
+
+    QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
+
+#endif
+
+
+
+
+in qio_channel_socket_acceptï¼
+
+qio_channel_socket_new already have it.
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992
+æéäººï¼ address@hidden address@hidden address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
+ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: ç­å¤: Re: [BUG]COLO failover hang
+
+
+
+
+
+Hi,
+
+On 2017/3/22 9:42, address@hidden wrote:
+ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼
+ï¼
+ï¼ index 13966f1..d65a0ea 100644
+ï¼
+ï¼
+ï¼ --- a/migration/socket.c
+ï¼
+ï¼
+ï¼ +++ b/migration/socket.c
+ï¼
+ï¼
+ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼
+ï¼
+ï¼       }
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       trace_migration_socket_incoming_accepted()
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+ï¼
+ï¼
+ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼
+ï¼
+ï¼       migration_channel_process_incoming(migrate_get_current(),
+ï¼
+ï¼
+ï¼                                          QIO_CHANNEL(sioc))
+ï¼
+ï¼
+ï¼       object_unref(OBJECT(sioc))
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ Is this patch ok?
+ï¼
+
+Yes, i think this works, but a better way maybe to call 
+qio_channel_set_feature()
+in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
+socket accept fd,
+Or fix it by this:
+
+diff --git a/io/channel-socket.c b/io/channel-socket.c
+index f546c68..ce6894c 100644
+--- a/io/channel-socket.c
++++ b/io/channel-socket.c
+@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
+                            Error **errp)
+  {
+      QIOChannelSocket *cioc
+-
+-    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
+-    cioc-ï¼fd = -1
++
++    cioc = qio_channel_socket_new()
+      cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
+      cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
+
+
+Thanks,
+Hailiang
+
+ï¼ I have test it . The test could not hang any more.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼
+ï¼
+ï¼
+ï¼ åä»¶äººï¼ address@hidden
+ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
+ï¼ æéäººï¼ address@hidden address@hidden address@hidden
+ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
+ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: [BUG]COLO failover hang
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
+ï¼ ï¼ï¼ Hi,
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
+failover,
+ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
+ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
+ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
+ï¼ ï¼ï¼ if we tried to cancel migration.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
+Error **errp)
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
+ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
+ï¼ ï¼ï¼      ... ...
+ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+ï¼ ï¼ï¼ and the
+ï¼ ï¼ï¼ migrate_fd_cancel()
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼   ... ...
+ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
+ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
+ï¼ ï¼ï¼      }
+ï¼ ï¼ï¼ }
+ï¼ ï¼
+ï¼ ï¼ (cc'd in Daniel Berrange).
+ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
+QIO_CHANNEL_FEATURE_SHUTDOWN) at the
+ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
+ï¼ ï¼
+ï¼
+ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+ï¼
+ï¼ ï¼ Dave
+ï¼ ï¼
+ï¼ ï¼ï¼ Thanks,
+ï¼ ï¼ï¼ Hailiang
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ Thank youã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I have test areadyã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
+placeã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node 
+qemu will not produce the problem,but Primary Node panic canã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I test a patch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        }
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
+"migration-socket-incoming")
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ My test will not hang any more.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
+ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
+ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  [BUG]COLO failover hang
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Hi,Wang.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ You can test this branch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+http://wiki.qemu-project.org/Features/COLO
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Thanks
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ hi.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
+ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ thank you.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
+ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
+ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
+ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
+outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
+errp=0x0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
+(errp=0x7f3d62bfaa48,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼
+ï¼ ï¼ --
+ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
+ï¼ ï¼
+ï¼ ï¼ .
+ï¼ ï¼
+ï¼
+
+On 2017/3/22 16:09, address@hidden wrote:
+hi:
+
+yes.it is better.
+
+And should we delete
+Yes, you are right.
+#ifdef WIN32
+
+     QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL)
+
+#endif
+
+
+
+
+in qio_channel_socket_acceptï¼
+
+qio_channel_socket_new already have it.
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992
+æéäººï¼ address@hidden address@hidden address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03
+ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: ç­å¤: Re: [BUG]COLO failover hang
+
+
+
+
+
+Hi,
+
+On 2017/3/22 9:42, address@hidden wrote:
+ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼
+ï¼
+ï¼ index 13966f1..d65a0ea 100644
+ï¼
+ï¼
+ï¼ --- a/migration/socket.c
+ï¼
+ï¼
+ï¼ +++ b/migration/socket.c
+ï¼
+ï¼
+ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼
+ï¼
+ï¼       }
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       trace_migration_socket_incoming_accepted()
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+ï¼
+ï¼
+ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼
+ï¼
+ï¼       migration_channel_process_incoming(migrate_get_current(),
+ï¼
+ï¼
+ï¼                                          QIO_CHANNEL(sioc))
+ï¼
+ï¼
+ï¼       object_unref(OBJECT(sioc))
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ Is this patch ok?
+ï¼
+
+Yes, i think this works, but a better way maybe to call 
+qio_channel_set_feature()
+in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
+socket accept fd,
+Or fix it by this:
+
+diff --git a/io/channel-socket.c b/io/channel-socket.c
+index f546c68..ce6894c 100644
+--- a/io/channel-socket.c
++++ b/io/channel-socket.c
+@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
+                             Error **errp)
+   {
+       QIOChannelSocket *cioc
+-
+-    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
+-    cioc-ï¼fd = -1
++
++    cioc = qio_channel_socket_new()
+       cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr)
+       cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr)
+
+
+Thanks,
+Hailiang
+
+ï¼ I have test it . The test could not hang any more.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼
+ï¼
+ï¼
+ï¼ åä»¶äººï¼ address@hidden
+ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden
+ï¼ æéäººï¼ address@hidden address@hidden address@hidden
+ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
+ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  ç­å¤: Re: [BUG]COLO failover hang
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote:
+ï¼ ï¼ï¼ Hi,
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do 
+failover,
+ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
+ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
+ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
+ï¼ ï¼ï¼ if we tried to cancel migration.
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
+Error **errp)
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
+ï¼ ï¼ï¼      migration_channel_connect(s, ioc, NULL)
+ï¼ ï¼ï¼      ... ...
+ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+ï¼ ï¼ï¼ and the
+ï¼ ï¼ï¼ migrate_fd_cancel()
+ï¼ ï¼ï¼ {
+ï¼ ï¼ï¼   ... ...
+ï¼ ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
+ï¼ ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
+ï¼ ï¼ï¼      }
+ï¼ ï¼ï¼ }
+ï¼ ï¼
+ï¼ ï¼ (cc'd in Daniel Berrange).
+ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, 
+QIO_CHANNEL_FEATURE_SHUTDOWN) at the
+ï¼ ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
+ï¼ ï¼
+ï¼
+ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+ï¼
+ï¼ ï¼ Dave
+ï¼ ï¼
+ï¼ ï¼ï¼ Thanks,
+ï¼ ï¼ï¼ Hailiang
+ï¼ ï¼ï¼
+ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ Thank youã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I have test areadyã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same 
+placeã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node 
+qemu will not produce the problem,but Primary Node panic canã
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ I test a patch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ --- a/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        }
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), 
+"migration-socket-incoming")
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼        object_unref(OBJECT(sioc))
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ My test will not hang any more.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
+ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
+ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ï¼ ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  ç­å¤: Re:  [BUG]COLO failover hang
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Hi,Wang.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ You can test this branch:
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+http://wiki.qemu-project.org/Features/COLO
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Thanks
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ hi.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden)
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼     at migration/colo.c:264
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $3 = 0
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $1 = 6
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
+ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ thank you.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
+ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
+ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
+ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized 
+outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, 
+errp=0x0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message 
+(errp=0x7f3d62bfaa48,
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼ --
+ï¼ ï¼ï¼ï¼ ï¼ Thanks
+ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼ ï¼
+ï¼ ï¼ï¼ï¼
+ï¼ ï¼ï¼
+ï¼ ï¼ --
+ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
+ï¼ ï¼
+ï¼ ï¼ .
+ï¼ ï¼
+ï¼
+
diff --git a/classification_output/01/mistranslation/71456293 b/classification_output/01/mistranslation/71456293
new file mode 100644
index 00000000..746a624c
--- /dev/null
+++ b/classification_output/01/mistranslation/71456293
@@ -0,0 +1,1486 @@
+mistranslation: 0.659
+instruction: 0.624
+semantic: 0.600
+other: 0.598
+
+[Qemu-devel][bug] qemu crash when migrate vm and vm's disks
+
+When migrate vm and vmâs disks target host qemu crash due to an invalid free.
+#0  object_unref (obj=0x1000) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
+#1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
+at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
+#2  flatview_destroy (view=0x560439653880) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
+#3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
+at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
+#4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
+#5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
+test base qemu-2.12.0
+ï¼
+but use lastest qemu(v6.0.0-rc2) also reproduce.
+As follow patch can resolve this problem:
+https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
+Steps to reproduce:
+(1) Create VM (virsh define)
+(2) Add 64 virtio scsi disks
+(3) migrate vm and vmâdisks
+-------------------------------------------------------------------------------------------------------------------------------------
+æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸­ååº
+çä¸ªäººæç¾¤ç»ãç¦æ­¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
+ææ£åï¼æ¬é®ä»¶ä¸­çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
+é®ä»¶ï¼
+This e-mail and its attachments contain confidential information from New H3C, which is
+intended only for the person or entity whose address is listed above. Any use of the
+information contained herein in any way (including, but not limited to, total or partial
+disclosure, reproduction, or dissemination) by persons other than the intended
+recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
+by phone or email immediately and delete it!
+
+* Yuchen (yu.chen@h3c.com) wrote:
+>
+When migrate vm and vmâs disks target host qemu crash due to an invalid free.
+>
+>
+#0  object_unref (obj=0x1000) at
+>
+/qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
+>
+#1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
+>
+at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
+>
+#2  flatview_destroy (view=0x560439653880) at
+>
+/qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
+>
+#3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
+>
+at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
+>
+#4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
+>
+#5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
+>
+>
+test base qemu-2.12.0ï¼but use lastest qemu(v6.0.0-rc2) also reproduce.
+Interesting.
+
+>
+As follow patch can resolve this problem:
+>
+https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
+That's a pci/rcu change; ccing Paolo and Micahel.
+
+>
+Steps to reproduce:
+>
+(1) Create VM (virsh define)
+>
+(2) Add 64 virtio scsi disks
+Is that hot adding the disks later, or are they included in the VM at
+creation?
+Can you provide a libvirt XML example?
+
+>
+(3) migrate vm and vmâdisks
+What do you mean by 'and vm disks' - are you doing a block migration?
+
+Dave
+
+>
+-------------------------------------------------------------------------------------------------------------------------------------
+>
+æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸­ååº
+>
+çä¸ªäººæç¾¤ç»ãç¦æ­¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
+>
+ææ£åï¼æ¬é®ä»¶ä¸­çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
+>
+é®ä»¶ï¼
+>
+This e-mail and its attachments contain confidential information from New
+>
+H3C, which is
+>
+intended only for the person or entity whose address is listed above. Any use
+>
+of the
+>
+information contained herein in any way (including, but not limited to, total
+>
+or partial
+>
+disclosure, reproduction, or dissemination) by persons other than the intended
+>
+recipient(s) is prohibited. If you receive this e-mail in error, please
+>
+notify the sender
+>
+by phone or email immediately and delete it!
+-- 
+Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
+
+>
+-----é®ä»¶åä»¶-----
+>
+åä»¶äºº: Dr. David Alan Gilbert [
+mailto:dgilbert@redhat.com
+]
+>
+åéæ¶é´: 2021å¹´4æ8æ¥ 19:27
+>
+æ¶ä»¶äºº: yuchen (Cloud) <yu.chen@h3c.com>; pbonzini@redhat.com;
+>
+mst@redhat.com
+>
+æé: qemu-devel@nongnu.org
+>
+ä¸»é¢: Re: [Qemu-devel][bug] qemu crash when migrate vm and vm's disks
+>
+>
+* Yuchen (yu.chen@h3c.com) wrote:
+>
+> When migrate vm and vmâs disks target host qemu crash due to an invalid
+>
+free.
+>
+>
+>
+> #0  object_unref (obj=0x1000) at
+>
+> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
+>
+> #1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
+>
+>     at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
+>
+> #2  flatview_destroy (view=0x560439653880) at
+>
+> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
+>
+> #3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
+>
+>     at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
+>
+> #4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
+>
+> #5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
+>
+>
+>
+> test base qemu-2.12.0ï¼but use lastest qemu(v6.0.0-rc2) also reproduce.
+>
+>
+Interesting.
+>
+>
+> As follow patch can resolve this problem:
+>
+>
+https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
+>
+>
+That's a pci/rcu change; ccing Paolo and Micahel.
+>
+>
+> Steps to reproduce:
+>
+> (1) Create VM (virsh define)
+>
+> (2) Add 64 virtio scsi disks
+>
+>
+Is that hot adding the disks later, or are they included in the VM at
+>
+creation?
+>
+Can you provide a libvirt XML example?
+>
+Include disks in the VM at creation
+
+vm disks xml (only virtio scsi disks):
+  <devices>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native'/>
+      <source file='/vms/tempp/vm-os'/>
+      <target dev='vda' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data1'/>
+      <target dev='sda' bus='scsi'/>
+      <address type='drive' controller='2' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data2'/>
+      <target dev='sdb' bus='scsi'/>
+      <address type='drive' controller='3' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data3'/>
+      <target dev='sdc' bus='scsi'/>
+      <address type='drive' controller='4' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data4'/>
+      <target dev='sdd' bus='scsi'/>
+      <address type='drive' controller='5' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data5'/>
+      <target dev='sde' bus='scsi'/>
+      <address type='drive' controller='6' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data6'/>
+      <target dev='sdf' bus='scsi'/>
+      <address type='drive' controller='7' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data7'/>
+      <target dev='sdg' bus='scsi'/>
+      <address type='drive' controller='8' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data8'/>
+      <target dev='sdh' bus='scsi'/>
+      <address type='drive' controller='9' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data9'/>
+      <target dev='sdi' bus='scsi'/>
+      <address type='drive' controller='10' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data10'/>
+      <target dev='sdj' bus='scsi'/>
+      <address type='drive' controller='11' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data11'/>
+      <target dev='sdk' bus='scsi'/>
+      <address type='drive' controller='12' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data12'/>
+      <target dev='sdl' bus='scsi'/>
+      <address type='drive' controller='13' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data13'/>
+      <target dev='sdm' bus='scsi'/>
+      <address type='drive' controller='14' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data14'/>
+      <target dev='sdn' bus='scsi'/>
+      <address type='drive' controller='15' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data15'/>
+      <target dev='sdo' bus='scsi'/>
+      <address type='drive' controller='16' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data16'/>
+      <target dev='sdp' bus='scsi'/>
+      <address type='drive' controller='17' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data17'/>
+      <target dev='sdq' bus='scsi'/>
+      <address type='drive' controller='18' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data18'/>
+      <target dev='sdr' bus='scsi'/>
+      <address type='drive' controller='19' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data19'/>
+      <target dev='sds' bus='scsi'/>
+      <address type='drive' controller='20' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data20'/>
+      <target dev='sdt' bus='scsi'/>
+      <address type='drive' controller='21' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data21'/>
+      <target dev='sdu' bus='scsi'/>
+      <address type='drive' controller='22' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data22'/>
+      <target dev='sdv' bus='scsi'/>
+      <address type='drive' controller='23' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data23'/>
+      <target dev='sdw' bus='scsi'/>
+      <address type='drive' controller='24' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data24'/>
+      <target dev='sdx' bus='scsi'/>
+      <address type='drive' controller='25' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data25'/>
+      <target dev='sdy' bus='scsi'/>
+      <address type='drive' controller='26' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data26'/>
+      <target dev='sdz' bus='scsi'/>
+      <address type='drive' controller='27' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data27'/>
+      <target dev='sdaa' bus='scsi'/>
+      <address type='drive' controller='28' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data28'/>
+      <target dev='sdab' bus='scsi'/>
+      <address type='drive' controller='29' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data29'/>
+      <target dev='sdac' bus='scsi'/>
+      <address type='drive' controller='30' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data30'/>
+      <target dev='sdad' bus='scsi'/>
+      <address type='drive' controller='31' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data31'/>
+      <target dev='sdae' bus='scsi'/>
+      <address type='drive' controller='32' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data32'/>
+      <target dev='sdaf' bus='scsi'/>
+      <address type='drive' controller='33' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data33'/>
+      <target dev='sdag' bus='scsi'/>
+      <address type='drive' controller='34' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data34'/>
+      <target dev='sdah' bus='scsi'/>
+      <address type='drive' controller='35' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data35'/>
+      <target dev='sdai' bus='scsi'/>
+      <address type='drive' controller='36' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data36'/>
+      <target dev='sdaj' bus='scsi'/>
+      <address type='drive' controller='37' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data37'/>
+      <target dev='sdak' bus='scsi'/>
+      <address type='drive' controller='38' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data38'/>
+      <target dev='sdal' bus='scsi'/>
+      <address type='drive' controller='39' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data39'/>
+      <target dev='sdam' bus='scsi'/>
+      <address type='drive' controller='40' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data40'/>
+      <target dev='sdan' bus='scsi'/>
+      <address type='drive' controller='41' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data41'/>
+      <target dev='sdao' bus='scsi'/>
+      <address type='drive' controller='42' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data42'/>
+      <target dev='sdap' bus='scsi'/>
+      <address type='drive' controller='43' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data43'/>
+      <target dev='sdaq' bus='scsi'/>
+      <address type='drive' controller='44' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data44'/>
+      <target dev='sdar' bus='scsi'/>
+      <address type='drive' controller='45' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data45'/>
+      <target dev='sdas' bus='scsi'/>
+      <address type='drive' controller='46' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data46'/>
+      <target dev='sdat' bus='scsi'/>
+      <address type='drive' controller='47' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data47'/>
+      <target dev='sdau' bus='scsi'/>
+      <address type='drive' controller='48' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data48'/>
+      <target dev='sdav' bus='scsi'/>
+      <address type='drive' controller='49' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data49'/>
+      <target dev='sdaw' bus='scsi'/>
+      <address type='drive' controller='50' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data50'/>
+      <target dev='sdax' bus='scsi'/>
+      <address type='drive' controller='51' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data51'/>
+      <target dev='sday' bus='scsi'/>
+      <address type='drive' controller='52' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data52'/>
+      <target dev='sdaz' bus='scsi'/>
+      <address type='drive' controller='53' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data53'/>
+      <target dev='sdba' bus='scsi'/>
+      <address type='drive' controller='54' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data54'/>
+      <target dev='sdbb' bus='scsi'/>
+      <address type='drive' controller='55' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data55'/>
+      <target dev='sdbc' bus='scsi'/>
+      <address type='drive' controller='56' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data56'/>
+      <target dev='sdbd' bus='scsi'/>
+      <address type='drive' controller='57' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data57'/>
+      <target dev='sdbe' bus='scsi'/>
+      <address type='drive' controller='58' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data58'/>
+      <target dev='sdbf' bus='scsi'/>
+      <address type='drive' controller='59' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data59'/>
+      <target dev='sdbg' bus='scsi'/>
+      <address type='drive' controller='60' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data60'/>
+      <target dev='sdbh' bus='scsi'/>
+      <address type='drive' controller='61' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data61'/>
+      <target dev='sdbi' bus='scsi'/>
+      <address type='drive' controller='62' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data62'/>
+      <target dev='sdbj' bus='scsi'/>
+      <address type='drive' controller='63' bus='0' target='0' unit='0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data63'/>
+      <target dev='sdbk' bus='scsi'/>
+      <address type='drive' controller='64' bus='0' target='0' unit='0'/>
+    </disk>
+    <controller type='scsi' index='0'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x02' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='1' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='2' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='3' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='4' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x04' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='5' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='6' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x06' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='7' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='8' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x08' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='9' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x09' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='10' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='11' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='12' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='13' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='14' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='15' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='16' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x10' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='17' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x11' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='18' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x12' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='19' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x13' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='20' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x14' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='21' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x15' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='22' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x16' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='23' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x17' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='24' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x18' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='25' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x19' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='26' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='27' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='28' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='29' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='30' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='31' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='32' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='33' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='34' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='35' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='36' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='37' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='38' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='39' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x09' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='40' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='41' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='42' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x0c' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='43' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x0d' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='44' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='45' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='46' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='47' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='48' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='49' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='50' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='51' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x10' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='52' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x11' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='53' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x12' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='54' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x13' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='55' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x14' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='56' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x15' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='57' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x16' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='58' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x17' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='59' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='60' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x19' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='61' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='62' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='63' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' 
+function='0x0'/>
+    </controller>
+    <controller type='scsi' index='64' model='virtio-scsi'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' 
+function='0x0'/>
+    </controller>
+    <controller type='pci' index='0' model='pci-root'/>
+    <controller type='pci' index='1' model='pci-bridge'>
+      <model name='pci-bridge'/>
+      <target chassisNr='1'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' 
+function='0x0'/>
+    </controller>
+    <controller type='pci' index='2' model='pci-bridge'>
+      <model name='pci-bridge'/>
+      <target chassisNr='2'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' 
+function='0x0'/>
+    </controller>
+  </devices>
+
+vm disks xml (only virtio disks):
+  <devices>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native'/>
+      <source file='/vms/tempp/vm-os'/>
+      <target dev='vda' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data2'/>
+      <target dev='vdb' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data3'/>
+      <target dev='vdc' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data4'/>
+      <target dev='vdd' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data5'/>
+      <target dev='vde' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data6'/>
+      <target dev='vdf' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data7'/>
+      <target dev='vdg' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data8'/>
+      <target dev='vdh' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data9'/>
+      <target dev='vdi' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x10' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data10'/>
+      <target dev='vdj' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x11' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data11'/>
+      <target dev='vdk' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x12' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data12'/>
+      <target dev='vdl' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x13' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data13'/>
+      <target dev='vdm' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x14' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data14'/>
+      <target dev='vdn' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x15' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data15'/>
+      <target dev='vdo' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x16' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data16'/>
+      <target dev='vdp' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x17' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data17'/>
+      <target dev='vdq' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data18'/>
+      <target dev='vdr' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x19' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data19'/>
+      <target dev='vds' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data20'/>
+      <target dev='vdt' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data21'/>
+      <target dev='vdu' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data22'/>
+      <target dev='vdv' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data23'/>
+      <target dev='vdw' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data24'/>
+      <target dev='vdx' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data25'/>
+      <target dev='vdy' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data26'/>
+      <target dev='vdz' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x04' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data27'/>
+      <target dev='vdaa' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data28'/>
+      <target dev='vdab' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x06' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data29'/>
+      <target dev='vdac' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data30'/>
+      <target dev='vdad' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x08' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data31'/>
+      <target dev='vdae' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x09' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data32'/>
+      <target dev='vdaf' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data33'/>
+      <target dev='vdag' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data34'/>
+      <target dev='vdah' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data35'/>
+      <target dev='vdai' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data36'/>
+      <target dev='vdaj' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data37'/>
+      <target dev='vdak' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data38'/>
+      <target dev='vdal' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x10' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data39'/>
+      <target dev='vdam' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x11' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data40'/>
+      <target dev='vdan' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x12' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data41'/>
+      <target dev='vdao' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x13' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data42'/>
+      <target dev='vdap' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x14' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data43'/>
+      <target dev='vdaq' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x15' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data44'/>
+      <target dev='vdar' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x16' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data45'/>
+      <target dev='vdas' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x17' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data46'/>
+      <target dev='vdat' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x18' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data47'/>
+      <target dev='vdau' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x19' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data48'/>
+      <target dev='vdav' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data49'/>
+      <target dev='vdaw' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data50'/>
+      <target dev='vdax' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data51'/>
+      <target dev='vday' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data52'/>
+      <target dev='vdaz' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data53'/>
+      <target dev='vdba' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data54'/>
+      <target dev='vdbb' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data55'/>
+      <target dev='vdbc' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data56'/>
+      <target dev='vdbd' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data57'/>
+      <target dev='vdbe' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data58'/>
+      <target dev='vdbf' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data59'/>
+      <target dev='vdbg' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data60'/>
+      <target dev='vdbh' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data61'/>
+      <target dev='vdbi' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x09' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data62'/>
+      <target dev='vdbj' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data63'/>
+      <target dev='vdbk' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' 
+function='0x0'/>
+    </disk>
+    <disk type='file' device='disk'>
+      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
+discard='unmap'/>
+      <source file='/vms/tempp/vm-data1'/>
+      <target dev='vdbl' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
+function='0x0'/>
+    </disk>
+    <controller type='pci' index='0' model='pci-root'/>
+    <controller type='pci' index='1' model='pci-bridge'>
+      <model name='pci-bridge'/>
+      <target chassisNr='1'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' 
+function='0x0'/>
+    </controller>
+    <controller type='pci' index='2' model='pci-bridge'>
+      <model name='pci-bridge'/>
+      <target chassisNr='2'/>
+      <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' 
+function='0x0'/>
+    </controller>
+  </devices>
+
+>
+> (3) migrate vm and vmâdisks
+>
+>
+What do you mean by 'and vm disks' - are you doing a block migration?
+>
+Yes, block migration.
+In fact, only migration domain also reproduced.
+
+>
+Dave
+>
+>
+> ----------------------------------------------------------------------
+>
+> ---------------------------------------------------------------
+>
+Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
+-------------------------------------------------------------------------------------------------------------------------------------
+æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸­ååº
+çä¸ªäººæç¾¤ç»ãç¦æ­¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
+ææ£åï¼æ¬é®ä»¶ä¸­çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
+é®ä»¶ï¼
+This e-mail and its attachments contain confidential information from New H3C, 
+which is
+intended only for the person or entity whose address is listed above. Any use 
+of the
+information contained herein in any way (including, but not limited to, total 
+or partial
+disclosure, reproduction, or dissemination) by persons other than the intended
+recipient(s) is prohibited. If you receive this e-mail in error, please notify 
+the sender
+by phone or email immediately and delete it!
+
diff --git a/classification_output/01/mistranslation/74466963 b/classification_output/01/mistranslation/74466963
new file mode 100644
index 00000000..fffafcf7
--- /dev/null
+++ b/classification_output/01/mistranslation/74466963
@@ -0,0 +1,1878 @@
+mistranslation: 0.927
+instruction: 0.903
+semantic: 0.891
+other: 0.877
+
+[Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
+
+Hi all,
+
+Does anyboday remember the similar issue post by hailiang months ago
+http://patchwork.ozlabs.org/patch/454322/
+At least tow bugs about migration had been fixed since that.
+And now we found the same issue at the tcg vm(kvm is fine), after
+migration, the content VM's memory is inconsistent.
+we add a patch to check memory content, you can find it from affix
+
+steps to reporduce:
+1) apply the patch and re-build qemu
+2) prepare the ubuntu guest and run memtest in grub.
+soruce side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off
+destination side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
+3) start migration
+with 1000M NIC, migration will finish within 3 min.
+
+at source:
+(qemu) migrate tcp:192.168.2.66:8881
+after saving ram complete
+e9e725df678d392b1a83b3a917f332bb
+qemu-system-x86_64: end ram md5
+(qemu)
+
+at destination:
+...skip...
+Completed load of VM with exit code 0 seq iteration 1264
+Completed load of VM with exit code 0 seq iteration 1265
+Completed load of VM with exit code 0 seq iteration 1266
+qemu-system-x86_64: after loading state section id 2(ram)
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
+
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+
+This occurs occasionally and only at tcg machine. It seems that
+some pages dirtied in source side don't transferred to destination.
+This problem can be reproduced even if we disable virtio.
+Is it OK for some pages that not transferred to destination when do
+migration ? Or is it a bug?
+Any idea...
+
+=================md5 check patch=============================
+
+diff --git a/Makefile.target b/Makefile.target
+index 962d004..e2cb8e9 100644
+--- a/Makefile.target
++++ b/Makefile.target
+@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
+ obj-y += memory_mapping.o
+ obj-y += dump.o
+ obj-y += migration/ram.o migration/savevm.o
+-LIBS := $(libs_softmmu) $(LIBS)
++LIBS := $(libs_softmmu) $(LIBS) -lplumb
+
+ # xen support
+ obj-$(CONFIG_XEN) += xen-common.o
+diff --git a/migration/ram.c b/migration/ram.c
+index 1eb155a..3b7a09d 100644
+--- a/migration/ram.c
++++ b/migration/ram.c
+@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
+version_id)
+}
+
+     rcu_read_unlock();
+-    DPRINTF("Completed load of VM with exit code %d seq iteration "
++    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
+             "%" PRIu64 "\n", ret, seq_iter);
+     return ret;
+ }
+diff --git a/migration/savevm.c b/migration/savevm.c
+index 0ad1b93..3feaa61 100644
+--- a/migration/savevm.c
++++ b/migration/savevm.c
+@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
+
+ }
+
++#include "exec/ram_addr.h"
++#include "qemu/rcu_queue.h"
++#include <clplumbing/md5.h>
++#ifndef MD5_DIGEST_LENGTH
++#define MD5_DIGEST_LENGTH 16
++#endif
++
++static void check_host_md5(void)
++{
++    int i;
++    unsigned char md[MD5_DIGEST_LENGTH];
++    rcu_read_lock();
++    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
+'pc.ram' block */
++    rcu_read_unlock();
++
++    MD5(block->host, block->used_length, md);
++    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
++        fprintf(stderr, "%02x", md[i]);
++    }
++    fprintf(stderr, "\n");
++    error_report("end ram md5");
++}
++
+ void qemu_savevm_state_begin(QEMUFile *f,
+                              const MigrationParams *params)
+ {
+@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile
+*f, bool iterable_only)
+save_section_header(f, se, QEMU_VM_SECTION_END);
+
+         ret = se->ops->save_live_complete_precopy(f, se->opaque);
++
++        fprintf(stderr, "after saving %s complete\n", se->idstr);
++        check_host_md5();
++
+         trace_savevm_section_end(se->idstr, se->section_id, ret);
+         save_section_footer(f, se);
+         if (ret < 0) {
+@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
+MigrationIncomingState *mis)
+section_id, le->se->idstr);
+                 return ret;
+             }
++            if (section_type == QEMU_VM_SECTION_END) {
++                error_report("after loading state section id %d(%s)",
++                             section_id, le->se->idstr);
++                check_host_md5();
++            }
+             if (!check_section_footer(f, le)) {
+                 return -EINVAL;
+             }
+@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
+     }
+
+     cpu_synchronize_all_post_init();
++    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
++    check_host_md5();
+
+     return ret;
+ }
+
+* Li Zhijian (address@hidden) wrote:
+>
+Hi all,
+>
+>
+Does anyboday remember the similar issue post by hailiang months ago
+>
+http://patchwork.ozlabs.org/patch/454322/
+>
+At least tow bugs about migration had been fixed since that.
+Yes, I wondered what happened to that.
+
+>
+And now we found the same issue at the tcg vm(kvm is fine), after migration,
+>
+the content VM's memory is inconsistent.
+Hmm, TCG only - I don't know much about that; but I guess something must
+be accessing memory without using the proper macros/functions so
+it doesn't mark it as dirty.
+
+>
+we add a patch to check memory content, you can find it from affix
+>
+>
+steps to reporduce:
+>
+1) apply the patch and re-build qemu
+>
+2) prepare the ubuntu guest and run memtest in grub.
+>
+soruce side:
+>
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+>
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+>
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+>
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+>
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+>
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+>
+pc-i440fx-2.3,accel=tcg,usb=off
+>
+>
+destination side:
+>
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+>
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+>
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+>
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+>
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+>
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+>
+pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
+>
+>
+3) start migration
+>
+with 1000M NIC, migration will finish within 3 min.
+>
+>
+at source:
+>
+(qemu) migrate tcp:192.168.2.66:8881
+>
+after saving ram complete
+>
+e9e725df678d392b1a83b3a917f332bb
+>
+qemu-system-x86_64: end ram md5
+>
+(qemu)
+>
+>
+at destination:
+>
+...skip...
+>
+Completed load of VM with exit code 0 seq iteration 1264
+>
+Completed load of VM with exit code 0 seq iteration 1265
+>
+Completed load of VM with exit code 0 seq iteration 1266
+>
+qemu-system-x86_64: after loading state section id 2(ram)
+>
+49c2dac7bde0e5e22db7280dcb3824f9
+>
+qemu-system-x86_64: end ram md5
+>
+qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
+>
+>
+49c2dac7bde0e5e22db7280dcb3824f9
+>
+qemu-system-x86_64: end ram md5
+>
+>
+This occurs occasionally and only at tcg machine. It seems that
+>
+some pages dirtied in source side don't transferred to destination.
+>
+This problem can be reproduced even if we disable virtio.
+>
+>
+Is it OK for some pages that not transferred to destination when do
+>
+migration ? Or is it a bug?
+I'm pretty sure that means it's a bug.  Hard to find though, I guess
+at least memtest is smaller than a big OS.  I think I'd dump the whole
+of memory on both sides, hexdump and diff them  - I'd guess it would
+just be one byte/word different, maybe that would offer some idea what
+wrote it.
+
+Dave
+
+>
+Any idea...
+>
+>
+=================md5 check patch=============================
+>
+>
+diff --git a/Makefile.target b/Makefile.target
+>
+index 962d004..e2cb8e9 100644
+>
+--- a/Makefile.target
+>
++++ b/Makefile.target
+>
+@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
+>
+obj-y += memory_mapping.o
+>
+obj-y += dump.o
+>
+obj-y += migration/ram.o migration/savevm.o
+>
+-LIBS := $(libs_softmmu) $(LIBS)
+>
++LIBS := $(libs_softmmu) $(LIBS) -lplumb
+>
+>
+# xen support
+>
+obj-$(CONFIG_XEN) += xen-common.o
+>
+diff --git a/migration/ram.c b/migration/ram.c
+>
+index 1eb155a..3b7a09d 100644
+>
+--- a/migration/ram.c
+>
++++ b/migration/ram.c
+>
+@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
+>
+version_id)
+>
+}
+>
+>
+rcu_read_unlock();
+>
+-    DPRINTF("Completed load of VM with exit code %d seq iteration "
+>
++    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
+>
+"%" PRIu64 "\n", ret, seq_iter);
+>
+return ret;
+>
+}
+>
+diff --git a/migration/savevm.c b/migration/savevm.c
+>
+index 0ad1b93..3feaa61 100644
+>
+--- a/migration/savevm.c
+>
++++ b/migration/savevm.c
+>
+@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
+>
+>
+}
+>
+>
++#include "exec/ram_addr.h"
+>
++#include "qemu/rcu_queue.h"
+>
++#include <clplumbing/md5.h>
+>
++#ifndef MD5_DIGEST_LENGTH
+>
++#define MD5_DIGEST_LENGTH 16
+>
++#endif
+>
++
+>
++static void check_host_md5(void)
+>
++{
+>
++    int i;
+>
++    unsigned char md[MD5_DIGEST_LENGTH];
+>
++    rcu_read_lock();
+>
++    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
+>
+'pc.ram' block */
+>
++    rcu_read_unlock();
+>
++
+>
++    MD5(block->host, block->used_length, md);
+>
++    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
+>
++        fprintf(stderr, "%02x", md[i]);
+>
++    }
+>
++    fprintf(stderr, "\n");
+>
++    error_report("end ram md5");
+>
++}
+>
++
+>
+void qemu_savevm_state_begin(QEMUFile *f,
+>
+const MigrationParams *params)
+>
+{
+>
+@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
+>
+bool iterable_only)
+>
+save_section_header(f, se, QEMU_VM_SECTION_END);
+>
+>
+ret = se->ops->save_live_complete_precopy(f, se->opaque);
+>
++
+>
++        fprintf(stderr, "after saving %s complete\n", se->idstr);
+>
++        check_host_md5();
+>
++
+>
+trace_savevm_section_end(se->idstr, se->section_id, ret);
+>
+save_section_footer(f, se);
+>
+if (ret < 0) {
+>
+@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
+>
+MigrationIncomingState *mis)
+>
+section_id, le->se->idstr);
+>
+return ret;
+>
+}
+>
++            if (section_type == QEMU_VM_SECTION_END) {
+>
++                error_report("after loading state section id %d(%s)",
+>
++                             section_id, le->se->idstr);
+>
++                check_host_md5();
+>
++            }
+>
+if (!check_section_footer(f, le)) {
+>
+return -EINVAL;
+>
+}
+>
+@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
+>
+}
+>
+>
+cpu_synchronize_all_post_init();
+>
++    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
+>
++    check_host_md5();
+>
+>
+return ret;
+>
+}
+>
+>
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
+* Li Zhijian (address@hidden) wrote:
+Hi all,
+
+Does anyboday remember the similar issue post by hailiang months ago
+http://patchwork.ozlabs.org/patch/454322/
+At least tow bugs about migration had been fixed since that.
+Yes, I wondered what happened to that.
+And now we found the same issue at the tcg vm(kvm is fine), after migration,
+the content VM's memory is inconsistent.
+Hmm, TCG only - I don't know much about that; but I guess something must
+be accessing memory without using the proper macros/functions so
+it doesn't mark it as dirty.
+we add a patch to check memory content, you can find it from affix
+
+steps to reporduce:
+1) apply the patch and re-build qemu
+2) prepare the ubuntu guest and run memtest in grub.
+soruce side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off
+
+destination side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
+
+3) start migration
+with 1000M NIC, migration will finish within 3 min.
+
+at source:
+(qemu) migrate tcp:192.168.2.66:8881
+after saving ram complete
+e9e725df678d392b1a83b3a917f332bb
+qemu-system-x86_64: end ram md5
+(qemu)
+
+at destination:
+...skip...
+Completed load of VM with exit code 0 seq iteration 1264
+Completed load of VM with exit code 0 seq iteration 1265
+Completed load of VM with exit code 0 seq iteration 1266
+qemu-system-x86_64: after loading state section id 2(ram)
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
+
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+
+This occurs occasionally and only at tcg machine. It seems that
+some pages dirtied in source side don't transferred to destination.
+This problem can be reproduced even if we disable virtio.
+
+Is it OK for some pages that not transferred to destination when do
+migration ? Or is it a bug?
+I'm pretty sure that means it's a bug.  Hard to find though, I guess
+at least memtest is smaller than a big OS.  I think I'd dump the whole
+of memory on both sides, hexdump and diff them  - I'd guess it would
+just be one byte/word different, maybe that would offer some idea what
+wrote it.
+Maybe one better way to do that is with the help of userfaultfd's write-protect
+capability. It is still in the development by Andrea Arcangeli, but there
+is a RFC version available, please refer to
+http://www.spinics.net/lists/linux-mm/msg97422.html
+ï¼I'm developing live memory snapshot which based on it, maybe this is another 
+scene where we
+can use userfaultfd's WP ;) ).
+Dave
+Any idea...
+
+=================md5 check patch=============================
+
+diff --git a/Makefile.target b/Makefile.target
+index 962d004..e2cb8e9 100644
+--- a/Makefile.target
++++ b/Makefile.target
+@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
+  obj-y += memory_mapping.o
+  obj-y += dump.o
+  obj-y += migration/ram.o migration/savevm.o
+-LIBS := $(libs_softmmu) $(LIBS)
++LIBS := $(libs_softmmu) $(LIBS) -lplumb
+
+  # xen support
+  obj-$(CONFIG_XEN) += xen-common.o
+diff --git a/migration/ram.c b/migration/ram.c
+index 1eb155a..3b7a09d 100644
+--- a/migration/ram.c
++++ b/migration/ram.c
+@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
+version_id)
+      }
+
+      rcu_read_unlock();
+-    DPRINTF("Completed load of VM with exit code %d seq iteration "
++    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
+              "%" PRIu64 "\n", ret, seq_iter);
+      return ret;
+  }
+diff --git a/migration/savevm.c b/migration/savevm.c
+index 0ad1b93..3feaa61 100644
+--- a/migration/savevm.c
++++ b/migration/savevm.c
+@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
+
+  }
+
++#include "exec/ram_addr.h"
++#include "qemu/rcu_queue.h"
++#include <clplumbing/md5.h>
++#ifndef MD5_DIGEST_LENGTH
++#define MD5_DIGEST_LENGTH 16
++#endif
++
++static void check_host_md5(void)
++{
++    int i;
++    unsigned char md[MD5_DIGEST_LENGTH];
++    rcu_read_lock();
++    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
+'pc.ram' block */
++    rcu_read_unlock();
++
++    MD5(block->host, block->used_length, md);
++    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
++        fprintf(stderr, "%02x", md[i]);
++    }
++    fprintf(stderr, "\n");
++    error_report("end ram md5");
++}
++
+  void qemu_savevm_state_begin(QEMUFile *f,
+                               const MigrationParams *params)
+  {
+@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
+bool iterable_only)
+          save_section_header(f, se, QEMU_VM_SECTION_END);
+
+          ret = se->ops->save_live_complete_precopy(f, se->opaque);
++
++        fprintf(stderr, "after saving %s complete\n", se->idstr);
++        check_host_md5();
++
+          trace_savevm_section_end(se->idstr, se->section_id, ret);
+          save_section_footer(f, se);
+          if (ret < 0) {
+@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
+MigrationIncomingState *mis)
+                               section_id, le->se->idstr);
+                  return ret;
+              }
++            if (section_type == QEMU_VM_SECTION_END) {
++                error_report("after loading state section id %d(%s)",
++                             section_id, le->se->idstr);
++                check_host_md5();
++            }
+              if (!check_section_footer(f, le)) {
+                  return -EINVAL;
+              }
+@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
+      }
+
+      cpu_synchronize_all_post_init();
++    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
++    check_host_md5();
+
+      return ret;
+  }
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+.
+
+On 12/03/2015 05:37 PM, Hailiang Zhang wrote:
+On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
+* Li Zhijian (address@hidden) wrote:
+Hi all,
+
+Does anyboday remember the similar issue post by hailiang months ago
+http://patchwork.ozlabs.org/patch/454322/
+At least tow bugs about migration had been fixed since that.
+Yes, I wondered what happened to that.
+And now we found the same issue at the tcg vm(kvm is fine), after
+migration,
+the content VM's memory is inconsistent.
+Hmm, TCG only - I don't know much about that; but I guess something must
+be accessing memory without using the proper macros/functions so
+it doesn't mark it as dirty.
+we add a patch to check memory content, you can find it from affix
+
+steps to reporduce:
+1) apply the patch and re-build qemu
+2) prepare the ubuntu guest and run memtest in grub.
+soruce side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off
+
+destination side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
+
+3) start migration
+with 1000M NIC, migration will finish within 3 min.
+
+at source:
+(qemu) migrate tcp:192.168.2.66:8881
+after saving ram complete
+e9e725df678d392b1a83b3a917f332bb
+qemu-system-x86_64: end ram md5
+(qemu)
+
+at destination:
+...skip...
+Completed load of VM with exit code 0 seq iteration 1264
+Completed load of VM with exit code 0 seq iteration 1265
+Completed load of VM with exit code 0 seq iteration 1266
+qemu-system-x86_64: after loading state section id 2(ram)
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+qemu-system-x86_64: qemu_loadvm_state: after
+cpu_synchronize_all_post_init
+
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+
+This occurs occasionally and only at tcg machine. It seems that
+some pages dirtied in source side don't transferred to destination.
+This problem can be reproduced even if we disable virtio.
+
+Is it OK for some pages that not transferred to destination when do
+migration ? Or is it a bug?
+I'm pretty sure that means it's a bug.  Hard to find though, I guess
+at least memtest is smaller than a big OS.  I think I'd dump the whole
+of memory on both sides, hexdump and diff them  - I'd guess it would
+just be one byte/word different, maybe that would offer some idea what
+wrote it.
+Maybe one better way to do that is with the help of userfaultfd's
+write-protect
+capability. It is still in the development by Andrea Arcangeli, but there
+is a RFC version available, please refer to
+http://www.spinics.net/lists/linux-mm/msg97422.html
+ï¼I'm developing live memory snapshot which based on it, maybe this is
+another scene where we
+can use userfaultfd's WP ;) ).
+sounds good.
+
+thanks
+Li
+Dave
+Any idea...
+
+=================md5 check patch=============================
+
+diff --git a/Makefile.target b/Makefile.target
+index 962d004..e2cb8e9 100644
+--- a/Makefile.target
++++ b/Makefile.target
+@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
+  obj-y += memory_mapping.o
+  obj-y += dump.o
+  obj-y += migration/ram.o migration/savevm.o
+-LIBS := $(libs_softmmu) $(LIBS)
++LIBS := $(libs_softmmu) $(LIBS) -lplumb
+
+  # xen support
+  obj-$(CONFIG_XEN) += xen-common.o
+diff --git a/migration/ram.c b/migration/ram.c
+index 1eb155a..3b7a09d 100644
+--- a/migration/ram.c
++++ b/migration/ram.c
+@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
+version_id)
+      }
+
+      rcu_read_unlock();
+-    DPRINTF("Completed load of VM with exit code %d seq iteration "
++    fprintf(stderr, "Completed load of VM with exit code %d seq
+iteration "
+              "%" PRIu64 "\n", ret, seq_iter);
+      return ret;
+  }
+diff --git a/migration/savevm.c b/migration/savevm.c
+index 0ad1b93..3feaa61 100644
+--- a/migration/savevm.c
++++ b/migration/savevm.c
+@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
+
+  }
+
++#include "exec/ram_addr.h"
++#include "qemu/rcu_queue.h"
++#include <clplumbing/md5.h>
++#ifndef MD5_DIGEST_LENGTH
++#define MD5_DIGEST_LENGTH 16
++#endif
++
++static void check_host_md5(void)
++{
++    int i;
++    unsigned char md[MD5_DIGEST_LENGTH];
++    rcu_read_lock();
++    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
+'pc.ram' block */
++    rcu_read_unlock();
++
++    MD5(block->host, block->used_length, md);
++    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
++        fprintf(stderr, "%02x", md[i]);
++    }
++    fprintf(stderr, "\n");
++    error_report("end ram md5");
++}
++
+  void qemu_savevm_state_begin(QEMUFile *f,
+                               const MigrationParams *params)
+  {
+@@ -1056,6 +1079,10 @@ void
+qemu_savevm_state_complete_precopy(QEMUFile *f,
+bool iterable_only)
+          save_section_header(f, se, QEMU_VM_SECTION_END);
+
+          ret = se->ops->save_live_complete_precopy(f, se->opaque);
++
++        fprintf(stderr, "after saving %s complete\n", se->idstr);
++        check_host_md5();
++
+          trace_savevm_section_end(se->idstr, se->section_id, ret);
+          save_section_footer(f, se);
+          if (ret < 0) {
+@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
+MigrationIncomingState *mis)
+                               section_id, le->se->idstr);
+                  return ret;
+              }
++            if (section_type == QEMU_VM_SECTION_END) {
++                error_report("after loading state section id %d(%s)",
++                             section_id, le->se->idstr);
++                check_host_md5();
++            }
+              if (!check_section_footer(f, le)) {
+                  return -EINVAL;
+              }
+@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
+      }
+
+      cpu_synchronize_all_post_init();
++    error_report("%s: after cpu_synchronize_all_post_init\n",
+__func__);
++    check_host_md5();
+
+      return ret;
+  }
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+.
+.
+--
+Best regards.
+Li Zhijian (8555)
+
+On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
+* Li Zhijian (address@hidden) wrote:
+Hi all,
+
+Does anyboday remember the similar issue post by hailiang months ago
+http://patchwork.ozlabs.org/patch/454322/
+At least tow bugs about migration had been fixed since that.
+Yes, I wondered what happened to that.
+And now we found the same issue at the tcg vm(kvm is fine), after migration,
+the content VM's memory is inconsistent.
+Hmm, TCG only - I don't know much about that; but I guess something must
+be accessing memory without using the proper macros/functions so
+it doesn't mark it as dirty.
+we add a patch to check memory content, you can find it from affix
+
+steps to reporduce:
+1) apply the patch and re-build qemu
+2) prepare the ubuntu guest and run memtest in grub.
+soruce side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off
+
+destination side:
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
+
+3) start migration
+with 1000M NIC, migration will finish within 3 min.
+
+at source:
+(qemu) migrate tcp:192.168.2.66:8881
+after saving ram complete
+e9e725df678d392b1a83b3a917f332bb
+qemu-system-x86_64: end ram md5
+(qemu)
+
+at destination:
+...skip...
+Completed load of VM with exit code 0 seq iteration 1264
+Completed load of VM with exit code 0 seq iteration 1265
+Completed load of VM with exit code 0 seq iteration 1266
+qemu-system-x86_64: after loading state section id 2(ram)
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
+
+49c2dac7bde0e5e22db7280dcb3824f9
+qemu-system-x86_64: end ram md5
+
+This occurs occasionally and only at tcg machine. It seems that
+some pages dirtied in source side don't transferred to destination.
+This problem can be reproduced even if we disable virtio.
+
+Is it OK for some pages that not transferred to destination when do
+migration ? Or is it a bug?
+I'm pretty sure that means it's a bug.  Hard to find though, I guess
+at least memtest is smaller than a big OS.  I think I'd dump the whole
+of memory on both sides, hexdump and diff them  - I'd guess it would
+just be one byte/word different, maybe that would offer some idea what
+wrote it.
+I try to dump and compare them, more than 10 pages are different.
+in source side, they are random value rather than always 'FF' 'FB' 'EF'
+'BF'... in destination.
+and not all of the different pages are continuous.
+
+thanks
+Li
+Dave
+Any idea...
+
+=================md5 check patch=============================
+
+diff --git a/Makefile.target b/Makefile.target
+index 962d004..e2cb8e9 100644
+--- a/Makefile.target
++++ b/Makefile.target
+@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
+  obj-y += memory_mapping.o
+  obj-y += dump.o
+  obj-y += migration/ram.o migration/savevm.o
+-LIBS := $(libs_softmmu) $(LIBS)
++LIBS := $(libs_softmmu) $(LIBS) -lplumb
+
+  # xen support
+  obj-$(CONFIG_XEN) += xen-common.o
+diff --git a/migration/ram.c b/migration/ram.c
+index 1eb155a..3b7a09d 100644
+--- a/migration/ram.c
++++ b/migration/ram.c
+@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
+version_id)
+      }
+
+      rcu_read_unlock();
+-    DPRINTF("Completed load of VM with exit code %d seq iteration "
++    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
+              "%" PRIu64 "\n", ret, seq_iter);
+      return ret;
+  }
+diff --git a/migration/savevm.c b/migration/savevm.c
+index 0ad1b93..3feaa61 100644
+--- a/migration/savevm.c
++++ b/migration/savevm.c
+@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
+
+  }
+
++#include "exec/ram_addr.h"
++#include "qemu/rcu_queue.h"
++#include <clplumbing/md5.h>
++#ifndef MD5_DIGEST_LENGTH
++#define MD5_DIGEST_LENGTH 16
++#endif
++
++static void check_host_md5(void)
++{
++    int i;
++    unsigned char md[MD5_DIGEST_LENGTH];
++    rcu_read_lock();
++    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
+'pc.ram' block */
++    rcu_read_unlock();
++
++    MD5(block->host, block->used_length, md);
++    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
++        fprintf(stderr, "%02x", md[i]);
++    }
++    fprintf(stderr, "\n");
++    error_report("end ram md5");
++}
++
+  void qemu_savevm_state_begin(QEMUFile *f,
+                               const MigrationParams *params)
+  {
+@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
+bool iterable_only)
+          save_section_header(f, se, QEMU_VM_SECTION_END);
+
+          ret = se->ops->save_live_complete_precopy(f, se->opaque);
++
++        fprintf(stderr, "after saving %s complete\n", se->idstr);
++        check_host_md5();
++
+          trace_savevm_section_end(se->idstr, se->section_id, ret);
+          save_section_footer(f, se);
+          if (ret < 0) {
+@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
+MigrationIncomingState *mis)
+                               section_id, le->se->idstr);
+                  return ret;
+              }
++            if (section_type == QEMU_VM_SECTION_END) {
++                error_report("after loading state section id %d(%s)",
++                             section_id, le->se->idstr);
++                check_host_md5();
++            }
+              if (!check_section_footer(f, le)) {
+                  return -EINVAL;
+              }
+@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
+      }
+
+      cpu_synchronize_all_post_init();
++    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
++    check_host_md5();
+
+      return ret;
+  }
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+
+.
+--
+Best regards.
+Li Zhijian (8555)
+
+* Li Zhijian (address@hidden) wrote:
+>
+>
+>
+On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
+>
+>* Li Zhijian (address@hidden) wrote:
+>
+>>Hi all,
+>
+>>
+>
+>>Does anyboday remember the similar issue post by hailiang months ago
+>
+>>
+http://patchwork.ozlabs.org/patch/454322/
+>
+>>At least tow bugs about migration had been fixed since that.
+>
+>
+>
+>Yes, I wondered what happened to that.
+>
+>
+>
+>>And now we found the same issue at the tcg vm(kvm is fine), after migration,
+>
+>>the content VM's memory is inconsistent.
+>
+>
+>
+>Hmm, TCG only - I don't know much about that; but I guess something must
+>
+>be accessing memory without using the proper macros/functions so
+>
+>it doesn't mark it as dirty.
+>
+>
+>
+>>we add a patch to check memory content, you can find it from affix
+>
+>>
+>
+>>steps to reporduce:
+>
+>>1) apply the patch and re-build qemu
+>
+>>2) prepare the ubuntu guest and run memtest in grub.
+>
+>>soruce side:
+>
+>>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+>
+>>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+>
+>>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+>
+>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+>
+>>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+>
+>>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+>
+>>pc-i440fx-2.3,accel=tcg,usb=off
+>
+>>
+>
+>>destination side:
+>
+>>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+>
+>>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+>
+>>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+>
+>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+>
+>>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+>
+>>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+>
+>>pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
+>
+>>
+>
+>>3) start migration
+>
+>>with 1000M NIC, migration will finish within 3 min.
+>
+>>
+>
+>>at source:
+>
+>>(qemu) migrate tcp:192.168.2.66:8881
+>
+>>after saving ram complete
+>
+>>e9e725df678d392b1a83b3a917f332bb
+>
+>>qemu-system-x86_64: end ram md5
+>
+>>(qemu)
+>
+>>
+>
+>>at destination:
+>
+>>...skip...
+>
+>>Completed load of VM with exit code 0 seq iteration 1264
+>
+>>Completed load of VM with exit code 0 seq iteration 1265
+>
+>>Completed load of VM with exit code 0 seq iteration 1266
+>
+>>qemu-system-x86_64: after loading state section id 2(ram)
+>
+>>49c2dac7bde0e5e22db7280dcb3824f9
+>
+>>qemu-system-x86_64: end ram md5
+>
+>>qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
+>
+>>
+>
+>>49c2dac7bde0e5e22db7280dcb3824f9
+>
+>>qemu-system-x86_64: end ram md5
+>
+>>
+>
+>>This occurs occasionally and only at tcg machine. It seems that
+>
+>>some pages dirtied in source side don't transferred to destination.
+>
+>>This problem can be reproduced even if we disable virtio.
+>
+>>
+>
+>>Is it OK for some pages that not transferred to destination when do
+>
+>>migration ? Or is it a bug?
+>
+>
+>
+>I'm pretty sure that means it's a bug.  Hard to find though, I guess
+>
+>at least memtest is smaller than a big OS.  I think I'd dump the whole
+>
+>of memory on both sides, hexdump and diff them  - I'd guess it would
+>
+>just be one byte/word different, maybe that would offer some idea what
+>
+>wrote it.
+>
+>
+I try to dump and compare them, more than 10 pages are different.
+>
+in source side, they are random value rather than always 'FF' 'FB' 'EF'
+>
+'BF'... in destination.
+>
+>
+and not all of the different pages are continuous.
+I wonder if it happens on all of memtest's different test patterns,
+perhaps it might be possible to narrow it down if you tell memtest
+to only run one test at a time.
+
+Dave
+
+>
+>
+thanks
+>
+Li
+>
+>
+>
+>
+>
+>Dave
+>
+>
+>
+>>Any idea...
+>
+>>
+>
+>>=================md5 check patch=============================
+>
+>>
+>
+>>diff --git a/Makefile.target b/Makefile.target
+>
+>>index 962d004..e2cb8e9 100644
+>
+>>--- a/Makefile.target
+>
+>>+++ b/Makefile.target
+>
+>>@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
+>
+>>  obj-y += memory_mapping.o
+>
+>>  obj-y += dump.o
+>
+>>  obj-y += migration/ram.o migration/savevm.o
+>
+>>-LIBS := $(libs_softmmu) $(LIBS)
+>
+>>+LIBS := $(libs_softmmu) $(LIBS) -lplumb
+>
+>>
+>
+>>  # xen support
+>
+>>  obj-$(CONFIG_XEN) += xen-common.o
+>
+>>diff --git a/migration/ram.c b/migration/ram.c
+>
+>>index 1eb155a..3b7a09d 100644
+>
+>>--- a/migration/ram.c
+>
+>>+++ b/migration/ram.c
+>
+>>@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
+>
+>>version_id)
+>
+>>      }
+>
+>>
+>
+>>      rcu_read_unlock();
+>
+>>-    DPRINTF("Completed load of VM with exit code %d seq iteration "
+>
+>>+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
+>
+>>              "%" PRIu64 "\n", ret, seq_iter);
+>
+>>      return ret;
+>
+>>  }
+>
+>>diff --git a/migration/savevm.c b/migration/savevm.c
+>
+>>index 0ad1b93..3feaa61 100644
+>
+>>--- a/migration/savevm.c
+>
+>>+++ b/migration/savevm.c
+>
+>>@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
+>
+>>
+>
+>>  }
+>
+>>
+>
+>>+#include "exec/ram_addr.h"
+>
+>>+#include "qemu/rcu_queue.h"
+>
+>>+#include <clplumbing/md5.h>
+>
+>>+#ifndef MD5_DIGEST_LENGTH
+>
+>>+#define MD5_DIGEST_LENGTH 16
+>
+>>+#endif
+>
+>>+
+>
+>>+static void check_host_md5(void)
+>
+>>+{
+>
+>>+    int i;
+>
+>>+    unsigned char md[MD5_DIGEST_LENGTH];
+>
+>>+    rcu_read_lock();
+>
+>>+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
+>
+>>'pc.ram' block */
+>
+>>+    rcu_read_unlock();
+>
+>>+
+>
+>>+    MD5(block->host, block->used_length, md);
+>
+>>+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
+>
+>>+        fprintf(stderr, "%02x", md[i]);
+>
+>>+    }
+>
+>>+    fprintf(stderr, "\n");
+>
+>>+    error_report("end ram md5");
+>
+>>+}
+>
+>>+
+>
+>>  void qemu_savevm_state_begin(QEMUFile *f,
+>
+>>                               const MigrationParams *params)
+>
+>>  {
+>
+>>@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
+>
+>>bool iterable_only)
+>
+>>          save_section_header(f, se, QEMU_VM_SECTION_END);
+>
+>>
+>
+>>          ret = se->ops->save_live_complete_precopy(f, se->opaque);
+>
+>>+
+>
+>>+        fprintf(stderr, "after saving %s complete\n", se->idstr);
+>
+>>+        check_host_md5();
+>
+>>+
+>
+>>          trace_savevm_section_end(se->idstr, se->section_id, ret);
+>
+>>          save_section_footer(f, se);
+>
+>>          if (ret < 0) {
+>
+>>@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
+>
+>>MigrationIncomingState *mis)
+>
+>>                               section_id, le->se->idstr);
+>
+>>                  return ret;
+>
+>>              }
+>
+>>+            if (section_type == QEMU_VM_SECTION_END) {
+>
+>>+                error_report("after loading state section id %d(%s)",
+>
+>>+                             section_id, le->se->idstr);
+>
+>>+                check_host_md5();
+>
+>>+            }
+>
+>>              if (!check_section_footer(f, le)) {
+>
+>>                  return -EINVAL;
+>
+>>              }
+>
+>>@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
+>
+>>      }
+>
+>>
+>
+>>      cpu_synchronize_all_post_init();
+>
+>>+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
+>
+>>+    check_host_md5();
+>
+>>
+>
+>>      return ret;
+>
+>>  }
+>
+>>
+>
+>>
+>
+>>
+>
+>--
+>
+>Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>
+>
+>
+>
+>.
+>
+>
+>
+>
+--
+>
+Best regards.
+>
+Li Zhijian (8555)
+>
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+Li Zhijian <address@hidden> wrote:
+>
+Hi all,
+>
+>
+Does anyboday remember the similar issue post by hailiang months ago
+>
+http://patchwork.ozlabs.org/patch/454322/
+>
+At least tow bugs about migration had been fixed since that.
+>
+>
+And now we found the same issue at the tcg vm(kvm is fine), after
+>
+migration, the content VM's memory is inconsistent.
+>
+>
+we add a patch to check memory content, you can find it from affix
+>
+>
+steps to reporduce:
+>
+1) apply the patch and re-build qemu
+>
+2) prepare the ubuntu guest and run memtest in grub.
+>
+soruce side:
+>
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+>
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+>
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+>
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+>
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+>
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+>
+pc-i440fx-2.3,accel=tcg,usb=off
+>
+>
+destination side:
+>
+x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
+>
+e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
+>
+if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
+>
+virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
+>
+-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
+>
+tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
+>
+pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
+>
+>
+3) start migration
+>
+with 1000M NIC, migration will finish within 3 min.
+>
+>
+at source:
+>
+(qemu) migrate tcp:192.168.2.66:8881
+>
+after saving ram complete
+>
+e9e725df678d392b1a83b3a917f332bb
+>
+qemu-system-x86_64: end ram md5
+>
+(qemu)
+>
+>
+at destination:
+>
+...skip...
+>
+Completed load of VM with exit code 0 seq iteration 1264
+>
+Completed load of VM with exit code 0 seq iteration 1265
+>
+Completed load of VM with exit code 0 seq iteration 1266
+>
+qemu-system-x86_64: after loading state section id 2(ram)
+>
+49c2dac7bde0e5e22db7280dcb3824f9
+>
+qemu-system-x86_64: end ram md5
+>
+qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
+>
+>
+49c2dac7bde0e5e22db7280dcb3824f9
+>
+qemu-system-x86_64: end ram md5
+>
+>
+This occurs occasionally and only at tcg machine. It seems that
+>
+some pages dirtied in source side don't transferred to destination.
+>
+This problem can be reproduced even if we disable virtio.
+>
+>
+Is it OK for some pages that not transferred to destination when do
+>
+migration ? Or is it a bug?
+>
+>
+Any idea...
+Thanks for describing how to reproduce the bug.
+If some pages are not transferred to destination then it is a bug, so we
+need to know what the problem is, notice that the problem can be that
+TCG is not marking dirty some page, that Migration code "forgets" about
+that page, or anything eles altogether, that is what we need to find.
+
+There are more posibilities, I am not sure that memtest is on 32bit
+mode, and it is inside posibility that we are missing some state when we
+are on real mode.
+
+Will try to take a look at this.
+
+THanks, again.
+
+
+>
+>
+=================md5 check patch=============================
+>
+>
+diff --git a/Makefile.target b/Makefile.target
+>
+index 962d004..e2cb8e9 100644
+>
+--- a/Makefile.target
+>
++++ b/Makefile.target
+>
+@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
+>
+obj-y += memory_mapping.o
+>
+obj-y += dump.o
+>
+obj-y += migration/ram.o migration/savevm.o
+>
+-LIBS := $(libs_softmmu) $(LIBS)
+>
++LIBS := $(libs_softmmu) $(LIBS) -lplumb
+>
+>
+# xen support
+>
+obj-$(CONFIG_XEN) += xen-common.o
+>
+diff --git a/migration/ram.c b/migration/ram.c
+>
+index 1eb155a..3b7a09d 100644
+>
+--- a/migration/ram.c
+>
++++ b/migration/ram.c
+>
+@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque,
+>
+int version_id)
+>
+}
+>
+>
+rcu_read_unlock();
+>
+-    DPRINTF("Completed load of VM with exit code %d seq iteration "
+>
++    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
+>
+"%" PRIu64 "\n", ret, seq_iter);
+>
+return ret;
+>
+}
+>
+diff --git a/migration/savevm.c b/migration/savevm.c
+>
+index 0ad1b93..3feaa61 100644
+>
+--- a/migration/savevm.c
+>
++++ b/migration/savevm.c
+>
+@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
+>
+>
+}
+>
+>
++#include "exec/ram_addr.h"
+>
++#include "qemu/rcu_queue.h"
+>
++#include <clplumbing/md5.h>
+>
++#ifndef MD5_DIGEST_LENGTH
+>
++#define MD5_DIGEST_LENGTH 16
+>
++#endif
+>
++
+>
++static void check_host_md5(void)
+>
++{
+>
++    int i;
+>
++    unsigned char md[MD5_DIGEST_LENGTH];
+>
++    rcu_read_lock();
+>
++    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
+>
+'pc.ram' block */
+>
++    rcu_read_unlock();
+>
++
+>
++    MD5(block->host, block->used_length, md);
+>
++    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
+>
++        fprintf(stderr, "%02x", md[i]);
+>
++    }
+>
++    fprintf(stderr, "\n");
+>
++    error_report("end ram md5");
+>
++}
+>
++
+>
+void qemu_savevm_state_begin(QEMUFile *f,
+>
+const MigrationParams *params)
+>
+{
+>
+@@ -1056,6 +1079,10 @@ void
+>
+qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
+>
+save_section_header(f, se, QEMU_VM_SECTION_END);
+>
+>
+ret = se->ops->save_live_complete_precopy(f, se->opaque);
+>
++
+>
++        fprintf(stderr, "after saving %s complete\n", se->idstr);
+>
++        check_host_md5();
+>
++
+>
+trace_savevm_section_end(se->idstr, se->section_id, ret);
+>
+save_section_footer(f, se);
+>
+if (ret < 0) {
+>
+@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
+>
+MigrationIncomingState *mis)
+>
+section_id, le->se->idstr);
+>
+return ret;
+>
+}
+>
++            if (section_type == QEMU_VM_SECTION_END) {
+>
++                error_report("after loading state section id %d(%s)",
+>
++                             section_id, le->se->idstr);
+>
++                check_host_md5();
+>
++            }
+>
+if (!check_section_footer(f, le)) {
+>
+return -EINVAL;
+>
+}
+>
+@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
+>
+}
+>
+>
+cpu_synchronize_all_post_init();
+>
++    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
+>
++    check_host_md5();
+>
+>
+return ret;
+>
+}
+
+>
+>
+Thanks for describing how to reproduce the bug.
+>
+If some pages are not transferred to destination then it is a bug, so we need
+>
+to know what the problem is, notice that the problem can be that TCG is not
+>
+marking dirty some page, that Migration code "forgets" about that page, or
+>
+anything eles altogether, that is what we need to find.
+>
+>
+There are more posibilities, I am not sure that memtest is on 32bit mode, and
+>
+it is inside posibility that we are missing some state when we are on real
+>
+mode.
+>
+>
+Will try to take a look at this.
+>
+>
+THanks, again.
+>
+Hi Juan & Amit
+
+ Do you think we should add a mechanism to check the data integrity during LM 
+like Zhijian's patch did?  it may be very helpful for developers. 
+ Actually, I did the similar thing before in order to make sure that I did the 
+right thing we I change the code related to LM.
+
+Liang
+
+On (Fri) 04 Dec 2015 [01:43:07], Li, Liang Z wrote:
+>
+>
+>
+> Thanks for describing how to reproduce the bug.
+>
+> If some pages are not transferred to destination then it is a bug, so we
+>
+> need
+>
+> to know what the problem is, notice that the problem can be that TCG is not
+>
+> marking dirty some page, that Migration code "forgets" about that page, or
+>
+> anything eles altogether, that is what we need to find.
+>
+>
+>
+> There are more posibilities, I am not sure that memtest is on 32bit mode,
+>
+> and
+>
+> it is inside posibility that we are missing some state when we are on real
+>
+> mode.
+>
+>
+>
+> Will try to take a look at this.
+>
+>
+>
+> THanks, again.
+>
+>
+>
+>
+Hi Juan & Amit
+>
+>
+Do you think we should add a mechanism to check the data integrity during LM
+>
+like Zhijian's patch did?  it may be very helpful for developers.
+>
+Actually, I did the similar thing before in order to make sure that I did
+>
+the right thing we I change the code related to LM.
+If you mean for debugging, something that's not always on, then I'm
+fine with it.
+
+A script that goes along that shows the result of comparison of the
+diff will be helpful too, something that shows how many pages are
+differnt, how many bytes in a page on average, and so on.
+
+                Amit
+
diff --git a/classification_output/01/mistranslation/74545755 b/classification_output/01/mistranslation/74545755
new file mode 100644
index 00000000..32d247ac
--- /dev/null
+++ b/classification_output/01/mistranslation/74545755
@@ -0,0 +1,344 @@
+mistranslation: 0.752
+instruction: 0.700
+other: 0.683
+semantic: 0.669
+
+[Bug Report][RFC PATCH 0/1] block: fix failing assert on paused VM migration
+
+There's a bug (failing assert) which is reproduced during migration of
+a paused VM.  I am able to reproduce it on a stand with 2 nodes and a common
+NFS share, with VM's disk on that share.
+
+root@fedora40-1-vm:~# virsh domblklist alma8-vm
+ Target   Source
+------------------------------------------
+ sda      /mnt/shared/images/alma8.qcow2
+
+root@fedora40-1-vm:~# df -Th /mnt/shared
+Filesystem          Type  Size  Used Avail Use% Mounted on
+127.0.0.1:/srv/nfsd nfs4   63G   16G   48G  25% /mnt/shared
+
+On the 1st node:
+
+root@fedora40-1-vm:~# virsh start alma8-vm ; virsh suspend alma8-vm
+root@fedora40-1-vm:~# virsh migrate --compressed --p2p --persistent 
+--undefinesource --live alma8-vm qemu+ssh://fedora40-2-vm/system
+
+Then on the 2nd node:
+
+root@fedora40-2-vm:~# virsh migrate --compressed --p2p --persistent 
+--undefinesource --live alma8-vm qemu+ssh://fedora40-1-vm/system
+error: operation failed: domain is not running
+
+root@fedora40-2-vm:~# tail -3 /var/log/libvirt/qemu/alma8-vm.log
+2024-09-19 13:53:33.336+0000: initiating migration
+qemu-system-x86_64: ../block.c:6976: int 
+bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & 
+BDRV_O_INACTIVE)' failed.
+2024-09-19 13:53:42.991+0000: shutting down, reason=crashed
+
+Backtrace:
+
+(gdb) bt
+#0  0x00007f7eaa2f1664 in __pthread_kill_implementation () at /lib64/libc.so.6
+#1  0x00007f7eaa298c4e in raise () at /lib64/libc.so.6
+#2  0x00007f7eaa280902 in abort () at /lib64/libc.so.6
+#3  0x00007f7eaa28081e in __assert_fail_base.cold () at /lib64/libc.so.6
+#4  0x00007f7eaa290d87 in __assert_fail () at /lib64/libc.so.6
+#5  0x0000563c38b95eb8 in bdrv_inactivate_recurse (bs=0x563c3b6c60c0) at 
+../block.c:6976
+#6  0x0000563c38b95aeb in bdrv_inactivate_all () at ../block.c:7038
+#7  0x0000563c3884d354 in qemu_savevm_state_complete_precopy_non_iterable 
+(f=0x563c3b700c20, in_postcopy=false, inactivate_disks=true)
+    at ../migration/savevm.c:1571
+#8  0x0000563c3884dc1a in qemu_savevm_state_complete_precopy (f=0x563c3b700c20, 
+iterable_only=false, inactivate_disks=true) at ../migration/savevm.c:1631
+#9  0x0000563c3883a340 in migration_completion_precopy (s=0x563c3b4d51f0, 
+current_active_state=<optimized out>) at ../migration/migration.c:2780
+#10 migration_completion (s=0x563c3b4d51f0) at ../migration/migration.c:2844
+#11 migration_iteration_run (s=0x563c3b4d51f0) at ../migration/migration.c:3270
+#12 migration_thread (opaque=0x563c3b4d51f0) at ../migration/migration.c:3536
+#13 0x0000563c38dbcf14 in qemu_thread_start (args=0x563c3c2d5bf0) at 
+../util/qemu-thread-posix.c:541
+#14 0x00007f7eaa2ef6d7 in start_thread () at /lib64/libc.so.6
+#15 0x00007f7eaa373414 in clone () at /lib64/libc.so.6
+
+What happens here is that after 1st migration BDS related to HDD remains
+inactive as VM is still paused.  Then when we initiate 2nd migration,
+bdrv_inactivate_all() leads to the attempt to set BDRV_O_INACTIVE flag
+on that node which is already set, thus assert fails.
+
+Attached patch which simply skips setting flag if it's already set is more
+of a kludge than a clean solution.  Should we use more sophisticated logic
+which allows some of the nodes be in inactive state prior to the migration,
+and takes them into account during bdrv_inactivate_all()?  Comments would
+be appreciated.
+
+Andrey
+
+Andrey Drobyshev (1):
+  block: do not fail when inactivating node which is inactive
+
+ block.c | 10 +++++++++-
+ 1 file changed, 9 insertions(+), 1 deletion(-)
+
+-- 
+2.39.3
+
+Instead of throwing an assert let's just ignore that flag is already set
+and return.  We assume that it's going to be safe to ignore.  Otherwise
+this assert fails when migrating a paused VM back and forth.
+
+Ideally we'd like to have a more sophisticated solution, e.g. not even
+scan the nodes which should be inactive at this point.
+
+Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
+---
+ block.c | 10 +++++++++-
+ 1 file changed, 9 insertions(+), 1 deletion(-)
+
+diff --git a/block.c b/block.c
+index 7d90007cae..c1dcf906d1 100644
+--- a/block.c
++++ b/block.c
+@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
+bdrv_inactivate_recurse(BlockDriverState *bs)
+         return 0;
+     }
+ 
+-    assert(!(bs->open_flags & BDRV_O_INACTIVE));
++    if (bs->open_flags & BDRV_O_INACTIVE) {
++        /*
++         * Return here instead of throwing assert as a workaround to
++         * prevent failure on migrating paused VM.
++         * Here we assume that if we're trying to inactivate BDS that's
++         * already inactive, it's safe to just ignore it.
++         */
++        return 0;
++    }
+ 
+     /* Inactivate this node */
+     if (bs->drv->bdrv_inactivate) {
+-- 
+2.39.3
+
+[add migration maintainers]
+
+On 24.09.24 15:56, Andrey Drobyshev wrote:
+Instead of throwing an assert let's just ignore that flag is already set
+and return.  We assume that it's going to be safe to ignore.  Otherwise
+this assert fails when migrating a paused VM back and forth.
+
+Ideally we'd like to have a more sophisticated solution, e.g. not even
+scan the nodes which should be inactive at this point.
+
+Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
+---
+  block.c | 10 +++++++++-
+  1 file changed, 9 insertions(+), 1 deletion(-)
+
+diff --git a/block.c b/block.c
+index 7d90007cae..c1dcf906d1 100644
+--- a/block.c
++++ b/block.c
+@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
+bdrv_inactivate_recurse(BlockDriverState *bs)
+          return 0;
+      }
+-    assert(!(bs->open_flags & BDRV_O_INACTIVE));
++    if (bs->open_flags & BDRV_O_INACTIVE) {
++        /*
++         * Return here instead of throwing assert as a workaround to
++         * prevent failure on migrating paused VM.
++         * Here we assume that if we're trying to inactivate BDS that's
++         * already inactive, it's safe to just ignore it.
++         */
++        return 0;
++    }
+/* Inactivate this node */
+if (bs->drv->bdrv_inactivate) {
+I doubt that this a correct way to go.
+
+As far as I understand, "inactive" actually means that "storage is not belong to 
+qemu, but to someone else (another qemu process for example), and may be changed 
+transparently". In turn this means that Qemu should do nothing with inactive disks. So the 
+problem is that nobody called bdrv_activate_all on target, and we shouldn't ignore that.
+
+Hmm, I see in process_incoming_migration_bh() we do call bdrv_activate_all(), 
+but only in some scenarios. May be, the condition should be less strict here.
+
+Why we need any condition here at all? Don't we want to activate block-layer on 
+target after migration anyway?
+
+--
+Best regards,
+Vladimir
+
+On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote:
+>
+[add migration maintainers]
+>
+>
+On 24.09.24 15:56, Andrey Drobyshev wrote:
+>
+> [...]
+>
+>
+I doubt that this a correct way to go.
+>
+>
+As far as I understand, "inactive" actually means that "storage is not
+>
+belong to qemu, but to someone else (another qemu process for example),
+>
+and may be changed transparently". In turn this means that Qemu should
+>
+do nothing with inactive disks. So the problem is that nobody called
+>
+bdrv_activate_all on target, and we shouldn't ignore that.
+>
+>
+Hmm, I see in process_incoming_migration_bh() we do call
+>
+bdrv_activate_all(), but only in some scenarios. May be, the condition
+>
+should be less strict here.
+>
+>
+Why we need any condition here at all? Don't we want to activate
+>
+block-layer on target after migration anyway?
+>
+Hmm I'm not sure about the unconditional activation, since we at least
+have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it
+in such a case).  In current libvirt upstream I see such code:
+
+>
+/* Migration capabilities which should always be enabled as long as they
+>
+>
+* are supported by QEMU. If the capability is supposed to be enabled on both
+>
+>
+* sides of migration, it won't be enabled unless both sides support it.
+>
+>
+*/
+>
+>
+static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] =
+>
+{
+>
+>
+{QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER,
+>
+>
+QEMU_MIGRATION_SOURCE},
+>
+>
+>
+>
+{QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE,
+>
+>
+QEMU_MIGRATION_DESTINATION},
+>
+>
+};
+which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set.
+
+The code from process_incoming_migration_bh() you're referring to:
+
+>
+/* If capability late_block_activate is set:
+>
+>
+* Only fire up the block code now if we're going to restart the
+>
+>
+* VM, else 'cont' will do it.
+>
+>
+* This causes file locking to happen; so we don't want it to happen
+>
+>
+* unless we really are starting the VM.
+>
+>
+*/
+>
+>
+if (!migrate_late_block_activate() ||
+>
+>
+(autostart && (!global_state_received() ||
+>
+>
+runstate_is_live(global_state_get_runstate())))) {
+>
+>
+/* Make sure all file formats throw away their mutable metadata.
+>
+>
+>
+* If we get an error here, just don't restart the VM yet. */
+>
+>
+bdrv_activate_all(&local_err);
+>
+>
+if (local_err) {
+>
+>
+error_report_err(local_err);
+>
+>
+local_err = NULL;
+>
+>
+autostart = false;
+>
+>
+}
+>
+>
+}
+It states explicitly that we're either going to start VM right at this
+point if (autostart == true), or we wait till "cont" command happens.
+None of this is going to happen if we start another migration while
+still being in PAUSED state.  So I think it seems reasonable to take
+such case into account.  For instance, this patch does prevent the crash:
+
+>
+diff --git a/migration/migration.c b/migration/migration.c
+>
+index ae2be31557..3222f6745b 100644
+>
+--- a/migration/migration.c
+>
++++ b/migration/migration.c
+>
+@@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque)
+>
+*/
+>
+if (!migrate_late_block_activate() ||
+>
+(autostart && (!global_state_received() ||
+>
+-            runstate_is_live(global_state_get_runstate())))) {
+>
++            runstate_is_live(global_state_get_runstate()))) ||
+>
++         (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) {
+>
+/* Make sure all file formats throw away their mutable metadata.
+>
+* If we get an error here, just don't restart the VM yet. */
+>
+bdrv_activate_all(&local_err);
+What are your thoughts on it?
+
+Andrey
+
diff --git a/classification_output/01/mistranslation/7711787 b/classification_output/01/mistranslation/7711787
deleted file mode 100644
index ead1f32f..00000000
--- a/classification_output/01/mistranslation/7711787
+++ /dev/null
@@ -1,165 +0,0 @@
-mistranslation: 0.915
-semantic: 0.904
-instruction: 0.888
-other: 0.813
-
-[BUG] cxl,i386: e820 mappings may not be correct for cxl
-
-Context included below from prior discussion
-    - `cxl create-region` would fail on inability to allocate memory
-    - traced this down to the memory region being marked RESERVED
-    - E820 map marks the CXL fixed memory window as RESERVED
-
-
-Re: x86 errors, I found that region worked with this patch. (I also
-added the SRAT patches the Davidlohr posted, but I do not think they are
-relevant).
-
-I don't think this is correct, and setting this to E820_RAM causes the
-system to fail to boot at all, but with this change `cxl create-region`
-succeeds, which suggests our e820 mappings in the i386 machine are
-incorrect.
-
-Anyone who can help or have an idea as to what e820 should actually be
-doing with this region, or if this is correct and something else is
-failing, please help!
-
-
-diff --git a/hw/i386/pc.c b/hw/i386/pc.c
-index 566accf7e6..a5e688a742 100644
---- a/hw/i386/pc.c
-+++ b/hw/i386/pc.c
-@@ -1077,7 +1077,7 @@ void pc_memory_init(PCMachineState *pcms,
-                 memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
-                                       "cxl-fixed-memory-region", fw->size);
-                 memory_region_add_subregion(system_memory, fw->base, &fw->mr);
--                e820_add_entry(fw->base, fw->size, E820_RESERVED);
-+                e820_add_entry(fw->base, fw->size, E820_NVS);
-                 cxl_fmw_base += fw->size;
-                 cxl_resv_end = cxl_fmw_base;
-             }
-
-
-On Mon, Oct 10, 2022 at 05:32:42PM +0100, Jonathan Cameron wrote:
->
->
-> > but i'm not sure of what to do with this info.  We have some proof
->
-> > that real hardware works with this no problem, and the only difference
->
-> > is that the EFI/bios/firmware is setting the memory regions as `usable`
->
-> > or `soft reserved`, which would imply the EDK2 is the blocker here
->
-> > regardless of the OS driver status.
->
-> >
->
-> > But I'd seen elsewhere you had gotten some of this working, and I'm
->
-> > failing to get anything working at the moment.  If you have any input i
->
-> > would greatly appreciate the help.
->
-> >
->
-> > QEMU config:
->
-> >
->
-> > /opt/qemu-cxl2/bin/qemu-system-x86_64 \
->
-> > -drive
->
-> > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=d\
->
-> > -m 2G,slots=4,maxmem=4G \
->
-> > -smp 4 \
->
-> > -machine type=q35,accel=kvm,cxl=on \
->
-> > -enable-kvm \
->
-> > -nographic \
->
-> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
->
-> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
->
-> > -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=256M \
->
-> > -object memory-backend-file,id=lsa0,mem-path=/tmp/cxl-lsa0,size=256M \
->
-> > -device cxl-type3,bus=rp0,pmem=true,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0
->
-> > \
->
-> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=256M
->
-> >
->
-> > I'd seen on the lists that you had seen issues with single-rp setups,
->
-> > but no combination of configuration I've tried (including all the ones
->
-> > in the docs and tests) lead to a successful region creation with
->
-> > `cxl create-region`
->
->
->
-> Hmm. Let me have a play.  I've not run x86 tests for a while so
->
-> perhaps something is missing there.
->
->
->
-> I'm carrying a patch to override check_last_peer() in
->
-> cxl_port_setup_targets() as that is wrong for some combinations,
->
-> but that doesn't look like it's related to what you are seeing.
->
->
-I'm not sure if it's relevant, but turned out I'd forgotten I'm carrying 3
->
-patches that aren't upstream (and one is a horrible hack).
->
->
-Hack:
-https://lore.kernel.org/linux-cxl/20220819094655.000005ed@huawei.com/
->
-Shouldn't affect a simple case like this...
->
->
-https://lore.kernel.org/linux-cxl/20220819093133.00006c22@huawei.com/T/#t
->
-(Dan's version)
->
->
-https://lore.kernel.org/linux-cxl/20220815154044.24733-1-Jonathan.Cameron@huawei.com/T/#t
->
->
-For writes to work you will currently need two rps (nothing on the second is
->
-fine)
->
-as we still haven't resolved if the kernel should support an HDM decoder on
->
-a host bridge with one port.  I think it should (Spec allows it), others
->
-unconvinced.
->
->
-Note I haven't shifted over to x86 yet so may still be something different
->
-from
->
-arm64.
->
->
-Jonathan
->
->
-
diff --git a/classification_output/01/mistranslation/80604314 b/classification_output/01/mistranslation/80604314
new file mode 100644
index 00000000..798c2e86
--- /dev/null
+++ b/classification_output/01/mistranslation/80604314
@@ -0,0 +1,1480 @@
+mistranslation: 0.922
+other: 0.898
+semantic: 0.890
+instruction: 0.877
+
+[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
+
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
+    config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+
+Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+the autogenerated virtio-net-ccw device is present) works. Specifying
+several "-device virtio-net-pci" works as well.
+
+Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+works (in-between state does not compile).
+
+This is reproducible with tcg as well. Same problem both with
+--enable-vhost-vdpa and --disable-vhost-vdpa.
+
+Have not yet tried to figure out what might be special with
+virtio-ccw... anyone have an idea?
+
+[This should probably be considered a blocker?]
+
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+-device virtio-net-ccw in addition to the autogenerated device), I get
+>
+a segfault. gdb points to
+>
+>
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+config=0x55d6ad9e3f80 "RT") at
+>
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>
+(backtrace doesn't go further)
+>
+>
+Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+several "-device virtio-net-pci" works as well.
+>
+>
+Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+works (in-between state does not compile).
+Ouch. I didn't test all in-between states :(
+But I wish we had a 0-day instrastructure like kernel has,
+that catches things like that.
+
+>
+This is reproducible with tcg as well. Same problem both with
+>
+--enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+>
+Have not yet tried to figure out what might be special with
+>
+virtio-ccw... anyone have an idea?
+>
+>
+[This should probably be considered a blocker?]
+
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin" <mst@redhat.com> wrote:
+
+>
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+> -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+> a segfault. gdb points to
+>
+>
+>
+> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+>     config=0x55d6ad9e3f80 "RT") at
+>
+> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> 146     if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>
+>
+> (backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+
+>
+>
+>
+> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+> the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+> several "-device virtio-net-pci" works as well.
+>
+>
+>
+> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+> works (in-between state does not compile).
+>
+>
+Ouch. I didn't test all in-between states :(
+>
+But I wish we had a 0-day instrastructure like kernel has,
+>
+that catches things like that.
+Yep, that would be useful... so patchew only builds the complete series?
+
+>
+>
+> This is reproducible with tcg as well. Same problem both with
+>
+> --enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+>
+>
+> Have not yet tried to figure out what might be special with
+>
+> virtio-ccw... anyone have an idea?
+>
+>
+>
+> [This should probably be considered a blocker?]
+I think so, as it makes s390x unusable with more that one
+virtio-net-ccw device, and I don't even see a workaround.
+
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+On Fri, 24 Jul 2020 09:30:58 -0400
+>
+"Michael S. Tsirkin" <mst@redhat.com> wrote:
+>
+>
+> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> > When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+> > -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+> > a segfault. gdb points to
+>
+> >
+>
+> > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+> >     config=0x55d6ad9e3f80 "RT") at
+>
+> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > 146           if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> >
+>
+> > (backtrace doesn't go further)
+>
+>
+The core was incomplete, but running under gdb directly shows that it
+>
+is just a bog-standard config space access (first for that device).
+>
+>
+The cause of the crash is that nc->peer is not set... no idea how that
+>
+can happen, not that familiar with that part of QEMU. (Should the code
+>
+check, or is that really something that should not happen?)
+>
+>
+What I don't understand is why it is set correctly for the first,
+>
+autogenerated virtio-net-ccw device, but not for the second one, and
+>
+why virtio-net-pci doesn't show these problems. The only difference
+>
+between -ccw and -pci that comes to my mind here is that config space
+>
+accesses for ccw are done via an asynchronous operation, so timing
+>
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+
+>
+> >
+>
+> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+> > the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+> > several "-device virtio-net-pci" works as well.
+>
+> >
+>
+> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+> > works (in-between state does not compile).
+>
+>
+>
+> Ouch. I didn't test all in-between states :(
+>
+> But I wish we had a 0-day instrastructure like kernel has,
+>
+> that catches things like that.
+>
+>
+Yep, that would be useful... so patchew only builds the complete series?
+>
+>
+>
+>
+> > This is reproducible with tcg as well. Same problem both with
+>
+> > --enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+> >
+>
+> > Have not yet tried to figure out what might be special with
+>
+> > virtio-ccw... anyone have an idea?
+>
+> >
+>
+> > [This should probably be considered a blocker?]
+>
+>
+I think so, as it makes s390x unusable with more that one
+>
+virtio-net-ccw device, and I don't even see a workaround.
+
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin" <mst@redhat.com> wrote:
+
+>
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+> On Fri, 24 Jul 2020 09:30:58 -0400
+>
+> "Michael S. Tsirkin" <mst@redhat.com> wrote:
+>
+>
+>
+> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> > > When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+> > > -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+> > > a segfault. gdb points to
+>
+> > >
+>
+> > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+> > >     config=0x55d6ad9e3f80 "RT") at
+>
+> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > > 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> > >
+>
+> > > (backtrace doesn't go further)
+>
+>
+>
+> The core was incomplete, but running under gdb directly shows that it
+>
+> is just a bog-standard config space access (first for that device).
+>
+>
+>
+> The cause of the crash is that nc->peer is not set... no idea how that
+>
+> can happen, not that familiar with that part of QEMU. (Should the code
+>
+> check, or is that really something that should not happen?)
+>
+>
+>
+> What I don't understand is why it is set correctly for the first,
+>
+> autogenerated virtio-net-ccw device, but not for the second one, and
+>
+> why virtio-net-pci doesn't show these problems. The only difference
+>
+> between -ccw and -pci that comes to my mind here is that config space
+>
+> accesses for ccw are done via an asynchronous operation, so timing
+>
+> might be different.
+>
+>
+Hopefully Jason has an idea. Could you post a full command line
+>
+please? Do you need a working guest to trigger this? Does this trigger
+>
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on 
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+ 
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+
+>
+>
+> > >
+>
+> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
+>
+> > > the autogenerated virtio-net-ccw device is present) works. Specifying
+>
+> > > several "-device virtio-net-pci" works as well.
+>
+> > >
+>
+> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
+>
+> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
+>
+> > > works (in-between state does not compile).
+>
+> >
+>
+> > Ouch. I didn't test all in-between states :(
+>
+> > But I wish we had a 0-day instrastructure like kernel has,
+>
+> > that catches things like that.
+>
+>
+>
+> Yep, that would be useful... so patchew only builds the complete series?
+>
+>
+>
+> >
+>
+> > > This is reproducible with tcg as well. Same problem both with
+>
+> > > --enable-vhost-vdpa and --disable-vhost-vdpa.
+>
+> > >
+>
+> > > Have not yet tried to figure out what might be special with
+>
+> > > virtio-ccw... anyone have an idea?
+>
+> > >
+>
+> > > [This should probably be considered a blocker?]
+>
+>
+>
+> I think so, as it makes s390x unusable with more that one
+>
+> virtio-net-ccw device, and I don't even see a workaround.
+>
+
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+     config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+
+Thanks
+0001-virtio-net-check-the-existence-of-peer-before-accesi.patch
+Description:
+Text Data
+
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+
+>
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+> On Fri, 24 Jul 2020 11:17:57 -0400
+>
+> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>
+>
+>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+>>> On Fri, 24 Jul 2020 09:30:58 -0400
+>
+>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>>>
+>
+>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+>>>>> a segfault. gdb points to
+>
+>>>>>
+>
+>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+>>>>>      config=0x55d6ad9e3f80 "RT") at
+>
+>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+>>>>> 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>>>>>
+>
+>>>>> (backtrace doesn't go further)
+>
+>>> The core was incomplete, but running under gdb directly shows that it
+>
+>>> is just a bog-standard config space access (first for that device).
+>
+>>>
+>
+>>> The cause of the crash is that nc->peer is not set... no idea how that
+>
+>>> can happen, not that familiar with that part of QEMU. (Should the code
+>
+>>> check, or is that really something that should not happen?)
+>
+>>>
+>
+>>> What I don't understand is why it is set correctly for the first,
+>
+>>> autogenerated virtio-net-ccw device, but not for the second one, and
+>
+>>> why virtio-net-pci doesn't show these problems. The only difference
+>
+>>> between -ccw and -pci that comes to my mind here is that config space
+>
+>>> accesses for ccw are done via an asynchronous operation, so timing
+>
+>>> might be different.
+>
+>> Hopefully Jason has an idea. Could you post a full command line
+>
+>> please? Do you need a working guest to trigger this? Does this trigger
+>
+>> on an x86 host?
+>
+> Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+>
+>
+> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
+>
+> qemu,zpci=on
+>
+> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+> -device
+>
+> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+> -device virtio-net-ccw
+>
+>
+>
+> It seems it needs the guest actually doing something with the nics; I
+>
+> cannot reproduce the crash if I use the old advent calendar moon buggy
+>
+> image and just add a virtio-net-ccw device.
+>
+>
+>
+> (I don't think it's a problem with my local build, as I see the problem
+>
+> both on my laptop and on an LPAR.)
+>
+>
+>
+It looks to me we forget the check the existence of peer.
+>
+>
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck <cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+      config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck <cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+For autogenerated virtio-net-cww, I think the reason is that it has
+already had a peer set.
+Thanks
+
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+
+>
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+>
+> On Sat, 25 Jul 2020 08:40:07 +0800
+>
+> Jason Wang <jasowang@redhat.com> wrote:
+>
+>
+>
+>> On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+>>> On Fri, 24 Jul 2020 11:17:57 -0400
+>
+>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>>>
+>
+>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+>>>>> On Fri, 24 Jul 2020 09:30:58 -0400
+>
+>>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+>>>>>
+>
+>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
+>
+>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
+>
+>>>>>>> a segfault. gdb points to
+>
+>>>>>>>
+>
+>>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+>
+>>>>>>>       config=0x55d6ad9e3f80 "RT") at
+>
+>>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+>>>>>>> 146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+>>>>>>>
+>
+>>>>>>> (backtrace doesn't go further)
+>
+>>>>> The core was incomplete, but running under gdb directly shows that it
+>
+>>>>> is just a bog-standard config space access (first for that device).
+>
+>>>>>
+>
+>>>>> The cause of the crash is that nc->peer is not set... no idea how that
+>
+>>>>> can happen, not that familiar with that part of QEMU. (Should the code
+>
+>>>>> check, or is that really something that should not happen?)
+>
+>>>>>
+>
+>>>>> What I don't understand is why it is set correctly for the first,
+>
+>>>>> autogenerated virtio-net-ccw device, but not for the second one, and
+>
+>>>>> why virtio-net-pci doesn't show these problems. The only difference
+>
+>>>>> between -ccw and -pci that comes to my mind here is that config space
+>
+>>>>> accesses for ccw are done via an asynchronous operation, so timing
+>
+>>>>> might be different.
+>
+>>>> Hopefully Jason has an idea. Could you post a full command line
+>
+>>>> please? Do you need a working guest to trigger this? Does this trigger
+>
+>>>> on an x86 host?
+>
+>>> Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+>>>
+>
+>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
+>
+>>> qemu,zpci=on
+>
+>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+>>> -device
+>
+>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+>>> -device virtio-net-ccw
+>
+>>>
+>
+>>> It seems it needs the guest actually doing something with the nics; I
+>
+>>> cannot reproduce the crash if I use the old advent calendar moon buggy
+>
+>>> image and just add a virtio-net-ccw device.
+>
+>>>
+>
+>>> (I don't think it's a problem with my local build, as I see the problem
+>
+>>> both on my laptop and on an LPAR.)
+>
+>>
+>
+>> It looks to me we forget the check the existence of peer.
+>
+>>
+>
+>> Please try the attached patch to see if it works.
+>
+> Thanks, that patch gets my guest up and running again. So, FWIW,
+>
+>
+>
+> Tested-by: Cornelia Huck <cohuck@redhat.com>
+>
+>
+>
+> Any idea why this did not hit with virtio-net-pci (or the autogenerated
+>
+> virtio-net-ccw device)?
+>
+>
+>
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+
+>
+>
+For autogenerated virtio-net-cww, I think the reason is that it has
+>
+already had a peer set.
+Ok, that might well be.
+
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang <jasowang@redhat.com> wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>  wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+       config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck <cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+Sorry for being unclear, I meant for networking part, you just need
+start without peer, and you need a real guest (any Linux) that is trying
+to access the config space of virtio-net.
+Thanks
+For autogenerated virtio-net-cww, I think the reason is that it has
+already had a peer set.
+Ok, that might well be.
+
+On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+>
+>
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+>
+> On Mon, 27 Jul 2020 15:38:12 +0800
+>
+> Jason Wang <jasowang@redhat.com> wrote:
+>
+>
+>
+> > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+>
+> > > On Sat, 25 Jul 2020 08:40:07 +0800
+>
+> > > Jason Wang <jasowang@redhat.com> wrote:
+>
+> > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+> > > > > On Fri, 24 Jul 2020 11:17:57 -0400
+>
+> > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
+>
+> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
+>
+> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+>
+> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e.
+>
+> > > > > > > > > adding
+>
+> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
+>
+> > > > > > > > > device), I get
+>
+> > > > > > > > > a segfault. gdb points to
+>
+> > > > > > > > >
+>
+> > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
+>
+> > > > > > > > > (vdev=<optimized out>,
+>
+> > > > > > > > >        config=0x55d6ad9e3f80 "RT") at
+>
+> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > > > > > > > > 146     if (nc->peer->info->type ==
+>
+> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> > > > > > > > >
+>
+> > > > > > > > > (backtrace doesn't go further)
+>
+> > > > > > > The core was incomplete, but running under gdb directly shows
+>
+> > > > > > > that it
+>
+> > > > > > > is just a bog-standard config space access (first for that
+>
+> > > > > > > device).
+>
+> > > > > > >
+>
+> > > > > > > The cause of the crash is that nc->peer is not set... no idea
+>
+> > > > > > > how that
+>
+> > > > > > > can happen, not that familiar with that part of QEMU. (Should
+>
+> > > > > > > the code
+>
+> > > > > > > check, or is that really something that should not happen?)
+>
+> > > > > > >
+>
+> > > > > > > What I don't understand is why it is set correctly for the
+>
+> > > > > > > first,
+>
+> > > > > > > autogenerated virtio-net-ccw device, but not for the second
+>
+> > > > > > > one, and
+>
+> > > > > > > why virtio-net-pci doesn't show these problems. The only
+>
+> > > > > > > difference
+>
+> > > > > > > between -ccw and -pci that comes to my mind here is that config
+>
+> > > > > > > space
+>
+> > > > > > > accesses for ccw are done via an asynchronous operation, so
+>
+> > > > > > > timing
+>
+> > > > > > > might be different.
+>
+> > > > > > Hopefully Jason has an idea. Could you post a full command line
+>
+> > > > > > please? Do you need a working guest to trigger this? Does this
+>
+> > > > > > trigger
+>
+> > > > > > on an x86 host?
+>
+> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+> > > > >
+>
+> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
+>
+> > > > > qemu,zpci=on
+>
+> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+> > > > > -device
+>
+> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+> > > > > -device virtio-net-ccw
+>
+> > > > >
+>
+> > > > > It seems it needs the guest actually doing something with the nics;
+>
+> > > > > I
+>
+> > > > > cannot reproduce the crash if I use the old advent calendar moon
+>
+> > > > > buggy
+>
+> > > > > image and just add a virtio-net-ccw device.
+>
+> > > > >
+>
+> > > > > (I don't think it's a problem with my local build, as I see the
+>
+> > > > > problem
+>
+> > > > > both on my laptop and on an LPAR.)
+>
+> > > > It looks to me we forget the check the existence of peer.
+>
+> > > >
+>
+> > > > Please try the attached patch to see if it works.
+>
+> > > Thanks, that patch gets my guest up and running again. So, FWIW,
+>
+> > >
+>
+> > > Tested-by: Cornelia Huck <cohuck@redhat.com>
+>
+> > >
+>
+> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
+>
+> > > virtio-net-ccw device)?
+>
+> >
+>
+> > It can be hit with virtio-net-pci as well (just start without peer).
+>
+> Hm, I had not been able to reproduce the crash with a 'naked' -device
+>
+> virtio-net-pci. But checking seems to be the right idea anyway.
+>
+>
+>
+Sorry for being unclear, I meant for networking part, you just need start
+>
+without peer, and you need a real guest (any Linux) that is trying to access
+>
+the config space of virtio-net.
+>
+>
+Thanks
+A pxe guest will do it, but that doesn't support ccw, right?
+
+I'm still unclear why this triggers with ccw but not pci -
+any idea?
+
+>
+>
+>
+>
+> > For autogenerated virtio-net-cww, I think the reason is that it has
+>
+> > already had a peer set.
+>
+> Ok, that might well be.
+>
+>
+>
+>
+
+On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
+On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+        config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck<cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+Sorry for being unclear, I meant for networking part, you just need start
+without peer, and you need a real guest (any Linux) that is trying to access
+the config space of virtio-net.
+
+Thanks
+A pxe guest will do it, but that doesn't support ccw, right?
+Yes, it depends on the cli actually.
+I'm still unclear why this triggers with ccw but not pci -
+any idea?
+I don't test pxe but I can reproduce this with pci (just start a linux
+guest without a peer).
+Thanks
+
+On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
+>
+>
+On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
+>
+> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+>
+> > On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+>
+> > > On Mon, 27 Jul 2020 15:38:12 +0800
+>
+> > > Jason Wang<jasowang@redhat.com>  wrote:
+>
+> > >
+>
+> > > > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+>
+> > > > > On Sat, 25 Jul 2020 08:40:07 +0800
+>
+> > > > > Jason Wang<jasowang@redhat.com>  wrote:
+>
+> > > > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+>
+> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
+>
+> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
+>
+> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+>
+> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
+>
+> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
+>
+> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck
+>
+> > > > > > > > > > wrote:
+>
+> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device
+>
+> > > > > > > > > > > (i.e. adding
+>
+> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
+>
+> > > > > > > > > > > device), I get
+>
+> > > > > > > > > > > a segfault. gdb points to
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
+>
+> > > > > > > > > > > (vdev=<optimized out>,
+>
+> > > > > > > > > > >         config=0x55d6ad9e3f80 "RT") at
+>
+> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
+>
+> > > > > > > > > > > 146         if (nc->peer->info->type ==
+>
+> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > > (backtrace doesn't go further)
+>
+> > > > > > > > > The core was incomplete, but running under gdb directly
+>
+> > > > > > > > > shows that it
+>
+> > > > > > > > > is just a bog-standard config space access (first for that
+>
+> > > > > > > > > device).
+>
+> > > > > > > > >
+>
+> > > > > > > > > The cause of the crash is that nc->peer is not set... no
+>
+> > > > > > > > > idea how that
+>
+> > > > > > > > > can happen, not that familiar with that part of QEMU.
+>
+> > > > > > > > > (Should the code
+>
+> > > > > > > > > check, or is that really something that should not happen?)
+>
+> > > > > > > > >
+>
+> > > > > > > > > What I don't understand is why it is set correctly for the
+>
+> > > > > > > > > first,
+>
+> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second
+>
+> > > > > > > > > one, and
+>
+> > > > > > > > > why virtio-net-pci doesn't show these problems. The only
+>
+> > > > > > > > > difference
+>
+> > > > > > > > > between -ccw and -pci that comes to my mind here is that
+>
+> > > > > > > > > config space
+>
+> > > > > > > > > accesses for ccw are done via an asynchronous operation, so
+>
+> > > > > > > > > timing
+>
+> > > > > > > > > might be different.
+>
+> > > > > > > > Hopefully Jason has an idea. Could you post a full command
+>
+> > > > > > > > line
+>
+> > > > > > > > please? Do you need a working guest to trigger this? Does
+>
+> > > > > > > > this trigger
+>
+> > > > > > > > on an x86 host?
+>
+> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
+>
+> > > > > > >
+>
+> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg
+>
+> > > > > > > -cpu qemu,zpci=on
+>
+> > > > > > > -m 1024 -nographic -device
+>
+> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+>
+> > > > > > > -drive
+>
+> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+>
+> > > > > > > -device
+>
+> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+>
+> > > > > > > -device virtio-net-ccw
+>
+> > > > > > >
+>
+> > > > > > > It seems it needs the guest actually doing something with the
+>
+> > > > > > > nics; I
+>
+> > > > > > > cannot reproduce the crash if I use the old advent calendar
+>
+> > > > > > > moon buggy
+>
+> > > > > > > image and just add a virtio-net-ccw device.
+>
+> > > > > > >
+>
+> > > > > > > (I don't think it's a problem with my local build, as I see the
+>
+> > > > > > > problem
+>
+> > > > > > > both on my laptop and on an LPAR.)
+>
+> > > > > > It looks to me we forget the check the existence of peer.
+>
+> > > > > >
+>
+> > > > > > Please try the attached patch to see if it works.
+>
+> > > > > Thanks, that patch gets my guest up and running again. So, FWIW,
+>
+> > > > >
+>
+> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com>
+>
+> > > > >
+>
+> > > > > Any idea why this did not hit with virtio-net-pci (or the
+>
+> > > > > autogenerated
+>
+> > > > > virtio-net-ccw device)?
+>
+> > > > It can be hit with virtio-net-pci as well (just start without peer).
+>
+> > > Hm, I had not been able to reproduce the crash with a 'naked' -device
+>
+> > > virtio-net-pci. But checking seems to be the right idea anyway.
+>
+> > Sorry for being unclear, I meant for networking part, you just need start
+>
+> > without peer, and you need a real guest (any Linux) that is trying to
+>
+> > access
+>
+> > the config space of virtio-net.
+>
+> >
+>
+> > Thanks
+>
+> A pxe guest will do it, but that doesn't support ccw, right?
+>
+>
+>
+Yes, it depends on the cli actually.
+>
+>
+>
+>
+>
+> I'm still unclear why this triggers with ccw but not pci -
+>
+> any idea?
+>
+>
+>
+I don't test pxe but I can reproduce this with pci (just start a linux guest
+>
+without a peer).
+>
+>
+Thanks
+>
+Might be a good addition to a unit test. Not sure what would the
+test do exactly: just make sure guest runs? Looks like a lot of work
+for an empty test ... maybe we can poke at the guest config with
+qtest commands at least.
+
+-- 
+MST
+
+On 2020/7/27 ä¸å9:16, Michael S. Tsirkin wrote:
+On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
+On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote:
+On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
+On 2020/7/27 ä¸å4:41, Cornelia Huck wrote:
+On Mon, 27 Jul 2020 15:38:12 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/27 ä¸å2:43, Cornelia Huck wrote:
+On Sat, 25 Jul 2020 08:40:07 +0800
+Jason Wang<jasowang@redhat.com>  wrote:
+On 2020/7/24 ä¸å11:34, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 11:17:57 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
+On Fri, 24 Jul 2020 09:30:58 -0400
+"Michael S. Tsirkin"<mst@redhat.com>   wrote:
+On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
+When I start qemu with a second virtio-net-ccw device (i.e. adding
+-device virtio-net-ccw in addition to the autogenerated device), I get
+a segfault. gdb points to
+
+#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
+         config=0x55d6ad9e3f80 "RT") at 
+/home/cohuck/git/qemu/hw/net/virtio-net.c:146
+146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+
+(backtrace doesn't go further)
+The core was incomplete, but running under gdb directly shows that it
+is just a bog-standard config space access (first for that device).
+
+The cause of the crash is that nc->peer is not set... no idea how that
+can happen, not that familiar with that part of QEMU. (Should the code
+check, or is that really something that should not happen?)
+
+What I don't understand is why it is set correctly for the first,
+autogenerated virtio-net-ccw device, but not for the second one, and
+why virtio-net-pci doesn't show these problems. The only difference
+between -ccw and -pci that comes to my mind here is that config space
+accesses for ccw are done via an asynchronous operation, so timing
+might be different.
+Hopefully Jason has an idea. Could you post a full command line
+please? Do you need a working guest to trigger this? Does this trigger
+on an x86 host?
+Yes, it does trigger with tcg-on-x86 as well. I've been using
+
+s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
+-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
+-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
+-device 
+scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
+-device virtio-net-ccw
+
+It seems it needs the guest actually doing something with the nics; I
+cannot reproduce the crash if I use the old advent calendar moon buggy
+image and just add a virtio-net-ccw device.
+
+(I don't think it's a problem with my local build, as I see the problem
+both on my laptop and on an LPAR.)
+It looks to me we forget the check the existence of peer.
+
+Please try the attached patch to see if it works.
+Thanks, that patch gets my guest up and running again. So, FWIW,
+
+Tested-by: Cornelia Huck<cohuck@redhat.com>
+
+Any idea why this did not hit with virtio-net-pci (or the autogenerated
+virtio-net-ccw device)?
+It can be hit with virtio-net-pci as well (just start without peer).
+Hm, I had not been able to reproduce the crash with a 'naked' -device
+virtio-net-pci. But checking seems to be the right idea anyway.
+Sorry for being unclear, I meant for networking part, you just need start
+without peer, and you need a real guest (any Linux) that is trying to access
+the config space of virtio-net.
+
+Thanks
+A pxe guest will do it, but that doesn't support ccw, right?
+Yes, it depends on the cli actually.
+I'm still unclear why this triggers with ccw but not pci -
+any idea?
+I don't test pxe but I can reproduce this with pci (just start a linux guest
+without a peer).
+
+Thanks
+Might be a good addition to a unit test. Not sure what would the
+test do exactly: just make sure guest runs? Looks like a lot of work
+for an empty test ... maybe we can poke at the guest config with
+qtest commands at least.
+That should work or we can simply extend the exist virtio-net qtest to
+do that.
+Thanks
+
diff --git a/classification_output/01/mistranslation/80615920 b/classification_output/01/mistranslation/80615920
new file mode 100644
index 00000000..97712c2f
--- /dev/null
+++ b/classification_output/01/mistranslation/80615920
@@ -0,0 +1,348 @@
+mistranslation: 0.800
+other: 0.786
+instruction: 0.751
+semantic: 0.737
+
+[BUG] accel/tcg: cpu_exec_longjmp_cleanup: assertion failed: (cpu == current_cpu)
+
+It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 
+code.
+
+This code: 
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <signal.h>
+#include <unistd.h>
+
+pthread_t thread1, thread2;
+
+// Signal handler for SIGALRM
+void alarm_handler(int sig) {
+    // Do nothing, just wake up the other thread
+}
+
+// Thread 1 function
+void* thread1_func(void* arg) {
+    // Set up the signal handler for SIGALRM
+    signal(SIGALRM, alarm_handler);
+
+    // Wait for 5 seconds
+    sleep(1);
+
+    // Send SIGALRM signal to thread 2
+    pthread_kill(thread2, SIGALRM);
+
+    return NULL;
+}
+
+// Thread 2 function
+void* thread2_func(void* arg) {
+    // Wait for the SIGALRM signal
+    pause();
+
+    printf("Thread 2 woke up!\n");
+
+    return NULL;
+}
+
+int main() {
+    // Create thread 1
+    if (pthread_create(&thread1, NULL, thread1_func, NULL) != 0) {
+        fprintf(stderr, "Failed to create thread 1\n");
+        return 1;
+    }
+
+    // Create thread 2
+    if (pthread_create(&thread2, NULL, thread2_func, NULL) != 0) {
+        fprintf(stderr, "Failed to create thread 2\n");
+        return 1;
+    }
+
+    // Wait for both threads to finish
+    pthread_join(thread1, NULL);
+    pthread_join(thread2, NULL);
+
+    return 0;
+}
+
+
+Fails with this -strace log (there are also unsupported syscalls 334 and 435, 
+but it seems it doesn't affect the code much):
+
+...
+736 rt_sigaction(SIGALRM,0x000000001123ec20,0x000000001123ecc0) = 0
+736 clock_nanosleep(CLOCK_REALTIME,0,{tv_sec = 1,tv_nsec = 0},{tv_sec = 
+1,tv_nsec = 0})
+736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x0000000010800b38,8) = 0
+736 Unknown syscall 435
+736 
+clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|
+ ...
+736 rt_sigprocmask(SIG_SETMASK,0x0000000010800b38,NULL,8)
+736 set_robust_list(0x11a419a0,0) = -1 errno=38 (Function not implemented)
+736 rt_sigprocmask(SIG_SETMASK,0x0000000011a41fb0,NULL,8) = 0
+ = 0
+736 pause(0,0,2,277186368,0,295966400)
+736 
+futex(0x000000001123f990,FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET,738,NULL,NULL,0)
+ = 0
+736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x000000001123ee88,8) = 0
+736 getpid() = 736
+736 tgkill(736,739,SIGALRM) = 0
+ = -1 errno=4 (Interrupted system call)
+--- SIGALRM {si_signo=SIGALRM, si_code=SI_TKILL, si_pid=736, si_uid=0} ---
+0x48874a != 0x3c69e10
+736 rt_sigprocmask(SIG_SETMASK,0x000000001123ee88,NULL,8) = 0
+**
+ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
+(cpu == current_cpu)
+Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
+failed: (cpu == current_cpu)
+0x48874a != 0x3c69e10
+**
+ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
+(cpu == current_cpu)
+Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
+failed: (cpu == current_cpu)
+# 
+
+The code fails either with or without -singlestep, the command line:
+
+/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep  /opt/x86_64/alarm.bin
+
+Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't 
+use RDTSC on i486" [1], 
+with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints 
+current pointers of 
+cpu and current_cpu (line "0x48874a != 0x3c69e10").
+
+config.log (built as a part of buildroot, basically the minimal possible 
+configuration for running x86_64 on 486):
+
+# Configured with: 
+'/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/build/qemu-8.1.1/configure'
+ '--prefix=/usr' 
+'--cross-prefix=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/i486-buildroot-linux-gnu-'
+ '--audio-drv-list=' 
+'--python=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/python3'
+ 
+'--ninja=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/ninja' 
+'--disable-alsa' '--disable-bpf' '--disable-brlapi' '--disable-bsd-user' 
+'--disable-cap-ng' '--disable-capstone' '--disable-containers' 
+'--disable-coreaudio' '--disable-curl' '--disable-curses' 
+'--disable-dbus-display' '--disable-docs' '--disable-dsound' '--disable-hvf' 
+'--disable-jack' '--disable-libiscsi' '--disable-linux-aio' 
+'--disable-linux-io-uring' '--disable-malloc-trim' '--disable-membarrier' 
+'--disable-mpath' '--disable-netmap' '--disable-opengl' '--disable-oss' 
+'--disable-pa' '--disable-rbd' '--disable-sanitizers' '--disable-selinux' 
+'--disable-sparse' '--disable-strip' '--disable-vde' '--disable-vhost-crypto' 
+'--disable-vhost-user-blk-server' '--disable-virtfs' '--disable-whpx' 
+'--disable-xen' '--disable-attr' '--disable-kvm' '--disable-vhost-net' 
+'--disable-download' '--disable-hexagon-idef-parser' '--disable-system' 
+'--enable-linux-user' '--target-list=x86_64-linux-user' '--disable-vhost-user' 
+'--disable-slirp' '--disable-sdl' '--disable-fdt' '--enable-trace-backends=nop' 
+'--disable-tools' '--disable-guest-agent' '--disable-fuse' 
+'--disable-fuse-lseek' '--disable-seccomp' '--disable-libssh' 
+'--disable-libusb' '--disable-vnc' '--disable-nettle' '--disable-numa' 
+'--disable-pipewire' '--disable-spice' '--disable-usb-redir' 
+'--disable-install-blobs'
+
+Emulation of the same x86_64 code with qemu 6.2.0 installed on another x86_64 
+native machine works fine.
+
+[1]
+https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05387.html
+Best regards,
+Petr
+
+On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote:
+>
+>
+It seems there is a bug in SIGALRM handling when 486 system emulates x86_64
+>
+code.
+486 host is pretty well out of support currently. Can you reproduce
+this on a less ancient host CPU type ?
+
+>
+ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed:
+>
+(cpu == current_cpu)
+>
+Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
+>
+assertion failed: (cpu == current_cpu)
+>
+0x48874a != 0x3c69e10
+>
+**
+>
+ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed:
+>
+(cpu == current_cpu)
+>
+Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
+>
+assertion failed: (cpu == current_cpu)
+What compiler version do you build QEMU with? That
+assert is there because we have seen some buggy compilers
+in the past which don't correctly preserve the variable
+value as the setjmp/longjmp spec requires them to.
+
+thanks
+-- PMM
+
+Dne 27. 11. 23 v 10:37 Peter Maydell napsal(a):
+>
+On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote:
+>
+>
+>
+> It seems there is a bug in SIGALRM handling when 486 system emulates x86_64
+>
+> code.
+>
+>
+486 host is pretty well out of support currently. Can you reproduce
+>
+this on a less ancient host CPU type ?
+>
+It seems it only fails when the code is compiled for i486. QEMU built with the 
+same compiler with -march=i586 and above runs on the same physical hardware 
+without a problem. All -march= variants were executed on ryzen 3600.
+
+>
+> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
+>
+> failed: (cpu == current_cpu)
+>
+> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
+>
+> assertion failed: (cpu == current_cpu)
+>
+> 0x48874a != 0x3c69e10
+>
+> **
+>
+> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
+>
+> failed: (cpu == current_cpu)
+>
+> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
+>
+> assertion failed: (cpu == current_cpu)
+>
+>
+What compiler version do you build QEMU with? That
+>
+assert is there because we have seen some buggy compilers
+>
+in the past which don't correctly preserve the variable
+>
+value as the setjmp/longjmp spec requires them to.
+>
+i486 and i586+ code variants were compiled with GCC 13.2.0 (more exactly, 
+slackware64 current multilib distribution).
+
+i486 binary which runs on the real 486 is also GCC 13.2.0 and installed as a 
+part of the buildroot crosscompiler (about two week old git snapshot).
+
+>
+thanks
+>
+-- PMM
+best regards,
+Petr
+
+On 11/25/23 07:08, Petr Cvek wrote:
+ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
+(cpu == current_cpu)
+Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
+failed: (cpu == current_cpu)
+#
+
+The code fails either with or without -singlestep, the command line:
+
+/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep  /opt/x86_64/alarm.bin
+
+Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't use 
+RDTSC on i486" [1],
+with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints 
+current pointers of
+cpu and current_cpu (line "0x48874a != 0x3c69e10").
+If you try this again with 8.2-rc2, you should not see an assertion failure.
+You should see instead
+
+QEMU internal SIGILL {code=ILLOPC, addr=0x12345678}
+which I think more accurately summarizes the situation of attempting RDTSC on hardware
+that does not support it.
+r~
+
+Dne 29. 11. 23 v 15:25 Richard Henderson napsal(a):
+>
+On 11/25/23 07:08, Petr Cvek wrote:
+>
+> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
+>
+> failed: (cpu == current_cpu)
+>
+> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
+>
+> assertion failed: (cpu == current_cpu)
+>
+> #
+>
+>
+>
+> The code fails either with or without -singlestep, the command line:
+>
+>
+>
+> /usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestepÂ
+>
+> /opt/x86_64/alarm.bin
+>
+>
+>
+> Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't
+>
+> use RDTSC on i486" [1],
+>
+> with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now
+>
+> prints current pointers of
+>
+> cpu and current_cpu (line "0x48874a != 0x3c69e10").
+>
+>
+>
+If you try this again with 8.2-rc2, you should not see an assertion failure.
+>
+You should see instead
+>
+>
+QEMU internal SIGILL {code=ILLOPC, addr=0x12345678}
+>
+>
+which I think more accurately summarizes the situation of attempting RDTSC on
+>
+hardware that does not support it.
+>
+>
+Compilation of vanilla qemu v8.2.0-rc2 with -march=i486 by GCC 13.2.0 and 
+running the resulting binary on ryzen still leads to:
+
+**
+ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion failed: 
+(cpu == current_cpu)
+Bail out! ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion 
+failed: (cpu == current_cpu)
+Aborted
+
+>
+>
+r~
+Petr
+
diff --git a/classification_output/01/mistranslation/8720260 b/classification_output/01/mistranslation/8720260
deleted file mode 100644
index 32d247ac..00000000
--- a/classification_output/01/mistranslation/8720260
+++ /dev/null
@@ -1,344 +0,0 @@
-mistranslation: 0.752
-instruction: 0.700
-other: 0.683
-semantic: 0.669
-
-[Bug Report][RFC PATCH 0/1] block: fix failing assert on paused VM migration
-
-There's a bug (failing assert) which is reproduced during migration of
-a paused VM.  I am able to reproduce it on a stand with 2 nodes and a common
-NFS share, with VM's disk on that share.
-
-root@fedora40-1-vm:~# virsh domblklist alma8-vm
- Target   Source
-------------------------------------------
- sda      /mnt/shared/images/alma8.qcow2
-
-root@fedora40-1-vm:~# df -Th /mnt/shared
-Filesystem          Type  Size  Used Avail Use% Mounted on
-127.0.0.1:/srv/nfsd nfs4   63G   16G   48G  25% /mnt/shared
-
-On the 1st node:
-
-root@fedora40-1-vm:~# virsh start alma8-vm ; virsh suspend alma8-vm
-root@fedora40-1-vm:~# virsh migrate --compressed --p2p --persistent 
---undefinesource --live alma8-vm qemu+ssh://fedora40-2-vm/system
-
-Then on the 2nd node:
-
-root@fedora40-2-vm:~# virsh migrate --compressed --p2p --persistent 
---undefinesource --live alma8-vm qemu+ssh://fedora40-1-vm/system
-error: operation failed: domain is not running
-
-root@fedora40-2-vm:~# tail -3 /var/log/libvirt/qemu/alma8-vm.log
-2024-09-19 13:53:33.336+0000: initiating migration
-qemu-system-x86_64: ../block.c:6976: int 
-bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & 
-BDRV_O_INACTIVE)' failed.
-2024-09-19 13:53:42.991+0000: shutting down, reason=crashed
-
-Backtrace:
-
-(gdb) bt
-#0  0x00007f7eaa2f1664 in __pthread_kill_implementation () at /lib64/libc.so.6
-#1  0x00007f7eaa298c4e in raise () at /lib64/libc.so.6
-#2  0x00007f7eaa280902 in abort () at /lib64/libc.so.6
-#3  0x00007f7eaa28081e in __assert_fail_base.cold () at /lib64/libc.so.6
-#4  0x00007f7eaa290d87 in __assert_fail () at /lib64/libc.so.6
-#5  0x0000563c38b95eb8 in bdrv_inactivate_recurse (bs=0x563c3b6c60c0) at 
-../block.c:6976
-#6  0x0000563c38b95aeb in bdrv_inactivate_all () at ../block.c:7038
-#7  0x0000563c3884d354 in qemu_savevm_state_complete_precopy_non_iterable 
-(f=0x563c3b700c20, in_postcopy=false, inactivate_disks=true)
-    at ../migration/savevm.c:1571
-#8  0x0000563c3884dc1a in qemu_savevm_state_complete_precopy (f=0x563c3b700c20, 
-iterable_only=false, inactivate_disks=true) at ../migration/savevm.c:1631
-#9  0x0000563c3883a340 in migration_completion_precopy (s=0x563c3b4d51f0, 
-current_active_state=<optimized out>) at ../migration/migration.c:2780
-#10 migration_completion (s=0x563c3b4d51f0) at ../migration/migration.c:2844
-#11 migration_iteration_run (s=0x563c3b4d51f0) at ../migration/migration.c:3270
-#12 migration_thread (opaque=0x563c3b4d51f0) at ../migration/migration.c:3536
-#13 0x0000563c38dbcf14 in qemu_thread_start (args=0x563c3c2d5bf0) at 
-../util/qemu-thread-posix.c:541
-#14 0x00007f7eaa2ef6d7 in start_thread () at /lib64/libc.so.6
-#15 0x00007f7eaa373414 in clone () at /lib64/libc.so.6
-
-What happens here is that after 1st migration BDS related to HDD remains
-inactive as VM is still paused.  Then when we initiate 2nd migration,
-bdrv_inactivate_all() leads to the attempt to set BDRV_O_INACTIVE flag
-on that node which is already set, thus assert fails.
-
-Attached patch which simply skips setting flag if it's already set is more
-of a kludge than a clean solution.  Should we use more sophisticated logic
-which allows some of the nodes be in inactive state prior to the migration,
-and takes them into account during bdrv_inactivate_all()?  Comments would
-be appreciated.
-
-Andrey
-
-Andrey Drobyshev (1):
-  block: do not fail when inactivating node which is inactive
-
- block.c | 10 +++++++++-
- 1 file changed, 9 insertions(+), 1 deletion(-)
-
--- 
-2.39.3
-
-Instead of throwing an assert let's just ignore that flag is already set
-and return.  We assume that it's going to be safe to ignore.  Otherwise
-this assert fails when migrating a paused VM back and forth.
-
-Ideally we'd like to have a more sophisticated solution, e.g. not even
-scan the nodes which should be inactive at this point.
-
-Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
----
- block.c | 10 +++++++++-
- 1 file changed, 9 insertions(+), 1 deletion(-)
-
-diff --git a/block.c b/block.c
-index 7d90007cae..c1dcf906d1 100644
---- a/block.c
-+++ b/block.c
-@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
-bdrv_inactivate_recurse(BlockDriverState *bs)
-         return 0;
-     }
- 
--    assert(!(bs->open_flags & BDRV_O_INACTIVE));
-+    if (bs->open_flags & BDRV_O_INACTIVE) {
-+        /*
-+         * Return here instead of throwing assert as a workaround to
-+         * prevent failure on migrating paused VM.
-+         * Here we assume that if we're trying to inactivate BDS that's
-+         * already inactive, it's safe to just ignore it.
-+         */
-+        return 0;
-+    }
- 
-     /* Inactivate this node */
-     if (bs->drv->bdrv_inactivate) {
--- 
-2.39.3
-
-[add migration maintainers]
-
-On 24.09.24 15:56, Andrey Drobyshev wrote:
-Instead of throwing an assert let's just ignore that flag is already set
-and return.  We assume that it's going to be safe to ignore.  Otherwise
-this assert fails when migrating a paused VM back and forth.
-
-Ideally we'd like to have a more sophisticated solution, e.g. not even
-scan the nodes which should be inactive at this point.
-
-Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
----
-  block.c | 10 +++++++++-
-  1 file changed, 9 insertions(+), 1 deletion(-)
-
-diff --git a/block.c b/block.c
-index 7d90007cae..c1dcf906d1 100644
---- a/block.c
-+++ b/block.c
-@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
-bdrv_inactivate_recurse(BlockDriverState *bs)
-          return 0;
-      }
--    assert(!(bs->open_flags & BDRV_O_INACTIVE));
-+    if (bs->open_flags & BDRV_O_INACTIVE) {
-+        /*
-+         * Return here instead of throwing assert as a workaround to
-+         * prevent failure on migrating paused VM.
-+         * Here we assume that if we're trying to inactivate BDS that's
-+         * already inactive, it's safe to just ignore it.
-+         */
-+        return 0;
-+    }
-/* Inactivate this node */
-if (bs->drv->bdrv_inactivate) {
-I doubt that this a correct way to go.
-
-As far as I understand, "inactive" actually means that "storage is not belong to 
-qemu, but to someone else (another qemu process for example), and may be changed 
-transparently". In turn this means that Qemu should do nothing with inactive disks. So the 
-problem is that nobody called bdrv_activate_all on target, and we shouldn't ignore that.
-
-Hmm, I see in process_incoming_migration_bh() we do call bdrv_activate_all(), 
-but only in some scenarios. May be, the condition should be less strict here.
-
-Why we need any condition here at all? Don't we want to activate block-layer on 
-target after migration anyway?
-
---
-Best regards,
-Vladimir
-
-On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote:
->
-[add migration maintainers]
->
->
-On 24.09.24 15:56, Andrey Drobyshev wrote:
->
-> [...]
->
->
-I doubt that this a correct way to go.
->
->
-As far as I understand, "inactive" actually means that "storage is not
->
-belong to qemu, but to someone else (another qemu process for example),
->
-and may be changed transparently". In turn this means that Qemu should
->
-do nothing with inactive disks. So the problem is that nobody called
->
-bdrv_activate_all on target, and we shouldn't ignore that.
->
->
-Hmm, I see in process_incoming_migration_bh() we do call
->
-bdrv_activate_all(), but only in some scenarios. May be, the condition
->
-should be less strict here.
->
->
-Why we need any condition here at all? Don't we want to activate
->
-block-layer on target after migration anyway?
->
-Hmm I'm not sure about the unconditional activation, since we at least
-have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it
-in such a case).  In current libvirt upstream I see such code:
-
->
-/* Migration capabilities which should always be enabled as long as they
->
->
-* are supported by QEMU. If the capability is supposed to be enabled on both
->
->
-* sides of migration, it won't be enabled unless both sides support it.
->
->
-*/
->
->
-static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] =
->
-{
->
->
-{QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER,
->
->
-QEMU_MIGRATION_SOURCE},
->
->
->
->
-{QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE,
->
->
-QEMU_MIGRATION_DESTINATION},
->
->
-};
-which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set.
-
-The code from process_incoming_migration_bh() you're referring to:
-
->
-/* If capability late_block_activate is set:
->
->
-* Only fire up the block code now if we're going to restart the
->
->
-* VM, else 'cont' will do it.
->
->
-* This causes file locking to happen; so we don't want it to happen
->
->
-* unless we really are starting the VM.
->
->
-*/
->
->
-if (!migrate_late_block_activate() ||
->
->
-(autostart && (!global_state_received() ||
->
->
-runstate_is_live(global_state_get_runstate())))) {
->
->
-/* Make sure all file formats throw away their mutable metadata.
->
->
->
-* If we get an error here, just don't restart the VM yet. */
->
->
-bdrv_activate_all(&local_err);
->
->
-if (local_err) {
->
->
-error_report_err(local_err);
->
->
-local_err = NULL;
->
->
-autostart = false;
->
->
-}
->
->
-}
-It states explicitly that we're either going to start VM right at this
-point if (autostart == true), or we wait till "cont" command happens.
-None of this is going to happen if we start another migration while
-still being in PAUSED state.  So I think it seems reasonable to take
-such case into account.  For instance, this patch does prevent the crash:
-
->
-diff --git a/migration/migration.c b/migration/migration.c
->
-index ae2be31557..3222f6745b 100644
->
---- a/migration/migration.c
->
-+++ b/migration/migration.c
->
-@@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque)
->
-*/
->
-if (!migrate_late_block_activate() ||
->
-(autostart && (!global_state_received() ||
->
--            runstate_is_live(global_state_get_runstate())))) {
->
-+            runstate_is_live(global_state_get_runstate()))) ||
->
-+         (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) {
->
-/* Make sure all file formats throw away their mutable metadata.
->
-* If we get an error here, just don't restart the VM yet. */
->
-bdrv_activate_all(&local_err);
-What are your thoughts on it?
-
-Andrey
-
diff --git a/classification_output/01/mistranslation/8874178 b/classification_output/01/mistranslation/8874178
deleted file mode 100644
index 1ebfe288..00000000
--- a/classification_output/01/mistranslation/8874178
+++ /dev/null
@@ -1,202 +0,0 @@
-mistranslation: 0.928
-other: 0.912
-instruction: 0.835
-semantic: 0.829
-
-[Qemu-devel] [Bug?] Guest pause because VMPTRLD failed in KVM
-
-Hello,
-
-  We encountered a problem that a guest paused because the KMOD report VMPTRLD 
-failed.
-
-The related information is as follows:
-
-1) Qemu command:
-   /usr/bin/qemu-kvm -name omu1 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu
-host -m 15625 -realtime mlock=off -smp 8,sockets=1,cores=8,threads=1 -uuid
-a2aacfff-6583-48b4-b6a4-e6830e519931 -no-user-config -nodefaults -chardev
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/omu1.monitor,server,nowait
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
--boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
-virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
-file=/home/env/guest1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native
-  -device
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
-  -drive
-file=/home/env/guest_300G.img,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
-  -device
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
-  -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device
-virtio-net-pci,netdev=hostnet0,id=net0,mac=00:00:80:05:00:00,bus=pci.0,addr=0x3
--netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device
-virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:80:05:00:01,bus=pci.0,addr=0x4
--chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
--device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device
-cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
-virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
-
-   2) Qemu log:
-   KVM: entry failed, hardware error 0x4
-   RAX=00000000ffffffed RBX=ffff8803fa2d7fd8 RCX=0100000000000000
-RDX=0000000000000000
-   RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8803fa2d7e90
-RSP=ffff8803fa2efe90
-   R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
-R11=000000000000b69a
-   R12=0000000000000001 R13=ffffffff81a25b40 R14=0000000000000000
-R15=ffff8803fa2d7fd8
-   RIP=ffffffff81053e16 RFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
-   ES =0000 0000000000000000 ffffffff 00c00000
-   CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
-   SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
-   DS =0000 0000000000000000 ffffffff 00c00000
-   FS =0000 0000000000000000 ffffffff 00c00000
-   GS =0000 ffff88040f540000 ffffffff 00c00000
-   LDT=0000 0000000000000000 ffffffff 00c00000
-   TR =0040 ffff88040f550a40 00002087 00008b00 DPL=0 TSS64-busy
-   GDT=     ffff88040f549000 0000007f
-   IDT=     ffffffffff529000 00000fff
-   CR0=80050033 CR2=00007f81ca0c5000 CR3=00000003f5081000 CR4=000407e0
-   DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
-DR3=0000000000000000
-   DR6=00000000ffff0ff0 DR7=0000000000000400
-   EFER=0000000000000d01
-   Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ??
-?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
-
-   3) Demsg
-   [347315.028339] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
-   klogd 1.4.1, ---------- state change ----------
-   [347315.039506] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
-   [347315.051728] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
-   [347315.057472] vmwrite error: reg 6c0a value ffff88307e66e480 (err
-2120672384)
-   [347315.064567] Pid: 69523, comm: qemu-kvm Tainted: GF           X
-3.0.93-0.8-default #1
-   [347315.064569] Call Trace:
-   [347315.064587]  [<ffffffff810049d5>] dump_trace+0x75/0x300
-   [347315.064595]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
-   [347315.064617]  [<ffffffffa03738de>] vmx_vcpu_load+0x11e/0x1d0 [kvm_intel]
-   [347315.064647]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
-   [347315.064669]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
-   [347315.064676]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
-   [347315.064687]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-   [347315.064703]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
-   [347315.064732]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
-[kvm]
-   [347315.064759]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-   [347315.064771]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
-   [347315.064776]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
-   [347315.064783]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
-   [347315.064797]  [<00007fee51969ce7>] 0x7fee51969ce6
-   [347315.064799] vmwrite error: reg 6c0c value ffff88307e664000 (err
-2120630272)
-   [347315.064802] Pid: 69523, comm: qemu-kvm Tainted: GF           X
-3.0.93-0.8-default #1
-   [347315.064803] Call Trace:
-   [347315.064807]  [<ffffffff810049d5>] dump_trace+0x75/0x300
-   [347315.064811]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
-   [347315.064817]  [<ffffffffa03738ec>] vmx_vcpu_load+0x12c/0x1d0 [kvm_intel]
-   [347315.064832]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
-   [347315.064851]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
-   [347315.064855]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
-   [347315.064865]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-   [347315.064880]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
-   [347315.064907]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
-[kvm]
-   [347315.064933]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-   [347315.064943]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
-   [347315.064947]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
-   [347315.064951]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
-   [347315.064957]  [<00007fee51969ce7>] 0x7fee51969ce6
-   [347315.064959] vmwrite error: reg 6c10 value 0 (err 0)
-
-   4) The isssue can't be reporduced. I search the Intel VMX sepc about reaseons
-of vmptrld failure:
-   The instruction fails if its operand is not properly aligned, sets
-unsupported physical-address bits, or is equal to the VMXON
-   pointer. In addition, the instruction fails if the 32 bits in memory
-referenced by the operand do not match the VMCS
-   revision identifier supported by this processor.
-
-   But I can't find any cues from the KVM source code. It seems each
-   error conditions is impossible in theory. :(
-
-Any suggestions will be appreciated! Paolo?
-
--- 
-Regards,
--Gonglei
-
-On 10/11/2016 15:10, gong lei wrote:
->
-4) The isssue can't be reporduced. I search the Intel VMX sepc about
->
-reaseons
->
-of vmptrld failure:
->
-The instruction fails if its operand is not properly aligned, sets
->
-unsupported physical-address bits, or is equal to the VMXON
->
-pointer. In addition, the instruction fails if the 32 bits in memory
->
-referenced by the operand do not match the VMCS
->
-revision identifier supported by this processor.
->
->
-But I can't find any cues from the KVM source code. It seems each
->
-error conditions is impossible in theory. :(
-Yes, it should not happen. :(
-
-If it's not reproducible, it's really hard to say what it was, except a
-random memory corruption elsewhere or even a bit flip (!).
-
-Paolo
-
-On 2016/11/17 20:39, Paolo Bonzini wrote:
->
->
-On 10/11/2016 15:10, gong lei wrote:
->
->     4) The isssue can't be reporduced. I search the Intel VMX sepc about
->
-> reaseons
->
-> of vmptrld failure:
->
->     The instruction fails if its operand is not properly aligned, sets
->
-> unsupported physical-address bits, or is equal to the VMXON
->
->     pointer. In addition, the instruction fails if the 32 bits in memory
->
-> referenced by the operand do not match the VMCS
->
->     revision identifier supported by this processor.
->
->
->
->     But I can't find any cues from the KVM source code. It seems each
->
->     error conditions is impossible in theory. :(
->
-Yes, it should not happen. :(
->
->
-If it's not reproducible, it's really hard to say what it was, except a
->
-random memory corruption elsewhere or even a bit flip (!).
->
->
-Paolo
-Thanks for your reply, Paolo :)
-
--- 
-Regards,
--Gonglei
-
-- 
cgit 1.4.1