diff options
Diffstat (limited to '')
| -rw-r--r-- | results/classifier/108/other/74 | 16 | ||||
| -rw-r--r-- | results/classifier/108/other/740895 | 87 | ||||
| -rw-r--r-- | results/classifier/108/other/741115 | 29 | ||||
| -rw-r--r-- | results/classifier/108/other/741887 | 158 | ||||
| -rw-r--r-- | results/classifier/108/other/742 | 59 | ||||
| -rw-r--r-- | results/classifier/108/other/744 | 18 | ||||
| -rw-r--r-- | results/classifier/108/other/74466963 | 1888 | ||||
| -rw-r--r-- | results/classifier/108/other/744856 | 30 | ||||
| -rw-r--r-- | results/classifier/108/other/745 | 51 | ||||
| -rw-r--r-- | results/classifier/108/other/74545755 | 354 | ||||
| -rw-r--r-- | results/classifier/108/other/746 | 16 | ||||
| -rw-r--r-- | results/classifier/108/other/747 | 45 | ||||
| -rw-r--r-- | results/classifier/108/other/747583 | 42 | ||||
| -rw-r--r-- | results/classifier/108/other/748 | 16 | ||||
| -rw-r--r-- | results/classifier/108/other/749 | 16 |
15 files changed, 2825 insertions, 0 deletions
diff --git a/results/classifier/108/other/74 b/results/classifier/108/other/74 new file mode 100644 index 000000000..a28d76369 --- /dev/null +++ b/results/classifier/108/other/74 @@ -0,0 +1,16 @@ +device: 0.912 +semantic: 0.653 +network: 0.607 +performance: 0.440 +other: 0.421 +boot: 0.360 +graphic: 0.244 +debug: 0.151 +socket: 0.141 +vnc: 0.070 +files: 0.016 +permissions: 0.015 +KVM: 0.008 +PID: 0.005 + +AUD_set_volume_out takes SWVoiceOut as parameter, but controls HWVoiceOut diff --git a/results/classifier/108/other/740895 b/results/classifier/108/other/740895 new file mode 100644 index 000000000..edba0076f --- /dev/null +++ b/results/classifier/108/other/740895 @@ -0,0 +1,87 @@ +graphic: 0.717 +PID: 0.655 +performance: 0.624 +device: 0.577 +socket: 0.531 +debug: 0.484 +permissions: 0.480 +vnc: 0.468 +semantic: 0.452 +other: 0.390 +files: 0.363 +boot: 0.348 +network: 0.233 +KVM: 0.137 + +qemu freeze when loading msdos with EMM386.EXE NOEMS HIGHSCAN + +Qemu version used : 0.11.2 and 0.14.0 +Guest : Ms-Dos 6.2 +Host : Ubuntu 10.04 with 2.6.32-29-generic SMP i686 +Starting Qemu with command : qemu -hda dos.img -cpu 486 -m 16 + +When I start msDos under Qemu with the option (in CONFIG.SYS) +DEVICE=C:\DOS\EMM386.EXE NOEMS HIGHSCAN +the guest freeze. +If I remove "HIGHSCAN" system is booting (but my software is not working). + +The whole thing is working on a real computer with a 486 with 16Mb ram or a PII. + +"HIGHSCAN switch allows EMM386.EXE to map expanded memory pages or upper memory blocks (UMBs) over portions of the upper memory area (UMA) used by system read-only memory " from http://support.microsoft.com/kb/96522/en-us + +I add some traces inside "default_ioport_read" in ioport.c, but I don't see any access to F000h-F7FFh like said in ms help. + +Before the system hung, there is access to dma1, dma page register and dma2 : + +inb : 0087 00 +outb: 000c 00 +inb : 0000 00 +inb : 0000 00 +inb : 0001 00 +inb : 0001 00 +inb : 0083 00 +outb: 000c 00 +inb : 0002 00 +inb : 0002 00 +inb : 0003 00 +inb : 0003 00 +inb : 0081 00 +outb: 000c 00 +inb : 0004 00 +inb : 0004 00 +inb : 0005 00 +inb : 0005 00 +inb : 0082 00 +outb: 000c 00 +inb : 0006 00 +inb : 0006 00 +inb : 0007 00 +inb : 0007 00 +inb : 008b 00 +outb: 00d8 00 +inb : 00c4 00 +inb : 00c4 00 +inb : 00c6 00 +inb : 00c6 00 +inb : 0089 00 +outb: 00d8 00 +inb : 00c8 00 +inb : 00c8 00 +inb : 00ca 00 +inb : 00ca 00 +inb : 008a 00 +outb: 00d8 00 +inb : 00cc 00 +inb : 00cc 00 +inb : 00ce 00 +inb : 00ce 00 +outb: 000c 00 +outb: 00d8 00 + +Triaging old bug tickets ... QEMU 0.11 and 0.14 are pretty much outdated nowadays... can you still reproduce this problem with the latest version of QEMU? + +[Expired for QEMU because there has been no activity for 60 days.] + +FYI I experienced hangs with emm386.exe (with NOEMS but not HIGHSCAN) using qemu 3.1.0 (from debian buster), but not with qemu 5.0.1 + + diff --git a/results/classifier/108/other/741115 b/results/classifier/108/other/741115 new file mode 100644 index 000000000..594bc6fed --- /dev/null +++ b/results/classifier/108/other/741115 @@ -0,0 +1,29 @@ +debug: 0.917 +graphic: 0.850 +device: 0.823 +performance: 0.736 +other: 0.686 +network: 0.654 +permissions: 0.634 +semantic: 0.609 +socket: 0.565 +vnc: 0.501 +boot: 0.469 +files: 0.426 +PID: 0.334 +KVM: 0.225 + +Add support of coprocessor cp15, cp14 registers exposion in the embedded gdb server + +Please add support of exposion of ARM coprocesor registers/logic at the embedded gdb server, + for example of cp15, cp14, etc registers. + +Related project http://jtagarmgdbsrvr.sourceforge.net/index.html + +Also filled bug in the GDB http://sourceware.org/bugzilla/show_bug.cgi?id=12602 + +Since QEMU 3.0 the QEMU gdb stub supports read-only access to most cp14/cp15 registers. The gdb 'info all-registers' command will print all the registers (integer, fp and system registers), or individual registers can be read with commands like "print /x $SCTLR". + +Write access isn't supported (that is a lot trickier to do in a reliable way since QEMU's internals make assumptions that system registers can only be changed in the ways that the guest validly can, and there's no easy API to connect to the debug stub that would allow the user to change system registers in only safe ways). + + diff --git a/results/classifier/108/other/741887 b/results/classifier/108/other/741887 new file mode 100644 index 000000000..b41c690fc --- /dev/null +++ b/results/classifier/108/other/741887 @@ -0,0 +1,158 @@ +other: 0.746 +semantic: 0.695 +KVM: 0.693 +network: 0.663 +debug: 0.651 +graphic: 0.640 +device: 0.607 +performance: 0.591 +boot: 0.572 +PID: 0.568 +permissions: 0.513 +files: 0.509 +socket: 0.505 +vnc: 0.504 + +virsh snapshot-create too slow (kvm, qcow2, savevm) + +Action +====== +# time virsh snapshot-create 1 + +* Taking snapshot of a running KVM virtual machine + +Result +====== +Domain snapshot 1300983161 created +real 4m46.994s +user 0m0.000s +sys 0m0.010s + +Expected result +=============== +* Snapshot taken after few seconds instead of minutes. + +Environment +=========== +* Ubuntu Natty Narwhal upgraded from Lucid and Meerkat, fully updated. + +* Stock natty packages of libvirt and qemu installed (libvirt-bin 0.8.8-1ubuntu5; libvirt0 0.8.8-1ubuntu5; qemu-common 0.14.0+noroms-0ubuntu3; qemu-kvm 0.14.0+noroms-0ubuntu3). + +* Virtual machine disk format is qcow2 (debian 5 installed) +image: /storage/debian.qcow2 +file format: qcow2 +virtual size: 10G (10737418240 bytes) +disk size: 1.2G +cluster_size: 65536 +Snapshot list: +ID TAG VM SIZE DATE VM CLOCK +1 snap01 48M 2011-03-24 09:46:33 00:00:58.899 +2 1300979368 58M 2011-03-24 11:09:28 00:01:03.589 +3 1300983161 57M 2011-03-24 12:12:41 00:00:51.905 + +* qcow2 disk is stored on ext4 filesystem, without RAID or LVM or any special setup. + +* running guest VM takes about 40M RAM from inside, from outside 576M are given to that machine + +* host has fast dual-core pentium cpu with virtualization support, around 8G of RAM and 7200rpm harddrive (dd from urandom to file gives about 20M/s) + +* running processes: sshd, atd (empty), crond (empty), libvirtd, tmux, bash, rsyslogd, upstart-socket-bridge, udevd, dnsmasq, iotop (python) + +* networking is done by bridging and bonding + + +Detail description +================== + +* Under root, command 'virsh create-snapshot 1' is issued on booted and running KVM machine with debian inside. + +* After about four minutes, the process is done. + +* 'iotop' shows two 'kvm' processes reading/writing to disk. First one has IO around 1500 K/s, second one has around 400 K/s. That takes about three minutes. Then first process grabs about 3 M/s of IO and suddenly dissapears (1-2 sec). Then second process does about 7.5 M/s of IO for around a 1-2 minutes. + +* Snapshot is successfuly created and is usable for reverting or extracting. + +* Pretty much the same behaviour occurs when command 'savevm' is issued directly from qemu monitor, without using libvirf44bfb7fb978c9313ce050a1c4149bf04aa0a670t at all (actually, virsh snapshot-create just calls 'savevm' to the monitor socket). + +* This behaviour was observed on lucid, meerkat, natty and even with git version of libvirt (f44bfb7fb978c9313ce050a1c4149bf04aa0a670). Also slowsave packages from https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/524447 gave this issue. + + +Thank you for helping to solve this issue! + +ProblemType: Bug +DistroRelease: Ubuntu 11.04 +Package: libvirt-bin 0.8.8-1ubuntu5 +ProcVersionSignature: Ubuntu 2.6.38-7.38-server 2.6.38 +Uname: Linux 2.6.38-7-server x86_64 +Architecture: amd64 +Date: Thu Mar 24 12:19:41 2011 +InstallationMedia: Ubuntu-Server 10.04.2 LTS "Lucid Lynx" - Release amd64 (20110211.1) +ProcEnviron: + LANG=en_US.UTF-8 + SHELL=/bin/bash +SourcePackage: libvirt +UpgradeStatus: No upgrade log present (probably fresh install) + + + +Yup, I can definately reproduce this. + +The current upstream qemu.git from git://git.savannah.nongnu.org/qemu.git +also has the slow savevm. However, it's loadvm takes only a few seconds. + + +savevm _is_ slow, because it's writing to a qcow2 file with full (meta)data allocation which is terrible slow since 0.13 (and 0.12.5) unless you use cache=unsafe. It's the same slowdown as observed with default cache mode when performing an operating system install into a freshly created qcow2 - it may take several hours. To verify, run `iostat -dkx 5' and see how busy (the last column) your disk is during the save - I suspect it'll be about 100%. + +Confirmed that doing + + + kvm -drive file=lucid.img,cache=unsafe,index=0,boot=on -m 512M -smp 2 -vnc :1 -monitor stdio + +and doing 'savevm savevm5' + +takes about 2 seconds. + +So, for fast savevm, 'cache=unsafe' is the workaround. Shoudl this bug then be marked invalid, or 'wontfix'? + +I confirm that without 'cache' option, I have got from iostat those result while doing 'savevm' + +Device: sda +rrqm/s: 0.00 +wrqm/s: 316.00 +r/s: 0.00 +w/s: 94.80 +rkB/s: 0.00 +wkB/s: 1541.60 +avgrq-sz: 32.52 +avgqu-sz: 0.98 +await: 10.32 +svctm: 10.10 +%util: 95.76 + +I also confirm, that when option 'cache=unsafe' is used, snapshot (from qemu monitor) is done as quickly as it should (few seconds). + +I am not sure if this is a solution or workaround or just a closer description of a bug. + +http://libvirt.org/formatdomain.html#elementsDisks describes option 'cache'. When I use that (cache="none") it spits out: + +error: Failed to create domain from vm.xml +error: internal error process exited while connecting to monitor: kvm: -drive file=/home/dum8d0g/vms/deb.qcow2,if=none,id=drive-ide0-0-0,boot=on,format=qcow2,cache=none: could not open disk image /home/dum8d0g/vms/deb.qcow2: Invalid argument + +When that option is removed, domain is created successfuly. I guess I have another bugreport to fill. + +So, for me, the issue is somehow solved from the qemu side. I think, this could be marked as wontfix. + +In qemu 0.14 cache=writeback and cache=none are expected to perform well. The default cache=writethrough is a very conservative setting which is slow by design. I'm pretty sure that it has always been slow, even before 0.12.5. + +I think that the specific problem with savevm may be related to the VM state being saved in too small chunks. With cache=writethrough this will hurt most. + +I had posted a patch to fix the issue before:(http://patchwork.ozlabs.org/patch/64346/), saving memory state is time consuming, which may takes several minutes. + +@edison, + +if you want to push such a patch, please do it through upstream, since it is actually a new feature. + +I'm going to mark this 'wontfix' (as I thought I had done before), rather than invalid, though the latter still sounds accurate as well. + +Cool. Writes about 9 times the data of the actual snapshot size. + diff --git a/results/classifier/108/other/742 b/results/classifier/108/other/742 new file mode 100644 index 000000000..a607f8326 --- /dev/null +++ b/results/classifier/108/other/742 @@ -0,0 +1,59 @@ +other: 0.756 +semantic: 0.702 +vnc: 0.696 +PID: 0.673 +graphic: 0.668 +permissions: 0.598 +KVM: 0.588 +performance: 0.563 +device: 0.559 +debug: 0.529 +network: 0.507 +socket: 0.484 +boot: 0.437 +files: 0.384 + +Cache Layout wrong on many Zen Arch CPUs +Description of problem: +This is `coreinfo -l` when running Windows as host: + + + +This is `coreinfo -l` when running the same Windows as guest with 6 cores and 6 threads (half of each): + + +Steps to reproduce: +1. You need a AMD Ryzen 3900X. It has an L3 cache over 3 cores +2. Use `-cpu host,+topoext,host-cache-info=on` +3. Use `coreinfo -l` to see how the L3 cache is distributed +Additional information: +1. When running without `host-cache-info=on` then the L3 cache is spread on all the cpus. +2. `lscpu -e`: + +``` +CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ + 0 0 0 0 0:0:0:0 yes 4672.0698 2200.0000 3800.000 + 1 0 0 1 1:1:1:0 yes 4672.0698 2200.0000 3800.000 + 2 0 0 2 2:2:2:0 yes 4672.0698 2200.0000 3800.000 + 3 0 0 3 4:4:4:1 yes 4672.0698 2200.0000 3800.000 + 4 0 0 4 5:5:5:1 yes 4672.0698 2200.0000 3800.000 + 5 0 0 5 6:6:6:1 yes 4672.0698 2200.0000 3800.000 + 6 0 0 6 8:8:8:2 yes 4672.0698 2200.0000 3800.000 + 7 0 0 7 9:9:9:2 yes 4672.0698 2200.0000 3610.580 + 8 0 0 8 10:10:10:2 yes 4672.0698 2200.0000 3800.000 + 9 0 0 9 12:12:12:3 yes 4672.0698 2200.0000 3800.000 + 10 0 0 10 13:13:13:3 yes 4672.0698 2200.0000 3800.000 + 11 0 0 11 14:14:14:3 yes 4672.0698 2200.0000 3800.000 + 12 0 0 0 0:0:0:0 yes 4672.0698 2200.0000 3800.000 + 13 0 0 1 1:1:1:0 yes 4672.0698 2200.0000 3800.000 + 14 0 0 2 2:2:2:0 yes 4672.0698 2200.0000 3800.000 + 15 0 0 3 4:4:4:1 yes 4672.0698 2200.0000 3800.000 + 16 0 0 4 5:5:5:1 yes 4672.0698 2200.0000 3800.000 + 17 0 0 5 6:6:6:1 yes 4672.0698 2200.0000 3800.000 + 18 0 0 6 8:8:8:2 yes 4672.0698 2200.0000 3800.000 + 19 0 0 7 9:9:9:2 yes 4672.0698 2200.0000 3800.000 + 20 0 0 8 10:10:10:2 yes 4672.0698 2200.0000 3800.000 + 21 0 0 9 12:12:12:3 yes 4672.0698 2200.0000 3800.000 + 22 0 0 10 13:13:13:3 yes 4672.0698 2200.0000 3800.000 + 23 0 0 11 14:14:14:3 yes 4672.0698 2200.0000 3800.000 +``` diff --git a/results/classifier/108/other/744 b/results/classifier/108/other/744 new file mode 100644 index 000000000..e3b0f6fd8 --- /dev/null +++ b/results/classifier/108/other/744 @@ -0,0 +1,18 @@ +graphic: 0.796 +device: 0.786 +socket: 0.707 +files: 0.700 +vnc: 0.665 +network: 0.630 +boot: 0.509 +permissions: 0.410 +semantic: 0.394 +performance: 0.272 +PID: 0.199 +other: 0.136 +debug: 0.123 +KVM: 0.081 + +ppc64: Implement the remaining PowerISA v3.1 instructions +Additional information: +[PowerISA_public.v3.1.pdf](https://wiki.raptorcs.com/w/images/f/f5/PowerISA_public.v3.1.pdf) diff --git a/results/classifier/108/other/74466963 b/results/classifier/108/other/74466963 new file mode 100644 index 000000000..55d41733b --- /dev/null +++ b/results/classifier/108/other/74466963 @@ -0,0 +1,1888 @@ +device: 0.909 +permissions: 0.907 +KVM: 0.903 +debug: 0.897 +files: 0.896 +graphic: 0.895 +boot: 0.894 +performance: 0.892 +semantic: 0.891 +PID: 0.886 +socket: 0.879 +vnc: 0.878 +other: 0.877 +network: 0.871 + +[Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration + +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +And now we found the same issue at the tcg vm(kvm is fine), after +migration, the content VM's memory is inconsistent. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) +} + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile +*f, bool iterable_only) +save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) +section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); ++ check_host_md5(); + + return ret; + } + +* Li Zhijian (address@hidden) wrote: +> +Hi all, +> +> +Does anyboday remember the similar issue post by hailiang months ago +> +http://patchwork.ozlabs.org/patch/454322/ +> +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. + +> +And now we found the same issue at the tcg vm(kvm is fine), after migration, +> +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. + +> +we add a patch to check memory content, you can find it from affix +> +> +steps to reporduce: +> +1) apply the patch and re-build qemu +> +2) prepare the ubuntu guest and run memtest in grub. +> +soruce side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off +> +> +destination side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +> +> +3) start migration +> +with 1000M NIC, migration will finish within 3 min. +> +> +at source: +> +(qemu) migrate tcp:192.168.2.66:8881 +> +after saving ram complete +> +e9e725df678d392b1a83b3a917f332bb +> +qemu-system-x86_64: end ram md5 +> +(qemu) +> +> +at destination: +> +...skip... +> +Completed load of VM with exit code 0 seq iteration 1264 +> +Completed load of VM with exit code 0 seq iteration 1265 +> +Completed load of VM with exit code 0 seq iteration 1266 +> +qemu-system-x86_64: after loading state section id 2(ram) +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init +> +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +> +This occurs occasionally and only at tcg machine. It seems that +> +some pages dirtied in source side don't transferred to destination. +> +This problem can be reproduced even if we disable virtio. +> +> +Is it OK for some pages that not transferred to destination when do +> +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. + +Dave + +> +Any idea... +> +> +=================md5 check patch============================= +> +> +diff --git a/Makefile.target b/Makefile.target +> +index 962d004..e2cb8e9 100644 +> +--- a/Makefile.target +> ++++ b/Makefile.target +> +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o +> +obj-y += memory_mapping.o +> +obj-y += dump.o +> +obj-y += migration/ram.o migration/savevm.o +> +-LIBS := $(libs_softmmu) $(LIBS) +> ++LIBS := $(libs_softmmu) $(LIBS) -lplumb +> +> +# xen support +> +obj-$(CONFIG_XEN) += xen-common.o +> +diff --git a/migration/ram.c b/migration/ram.c +> +index 1eb155a..3b7a09d 100644 +> +--- a/migration/ram.c +> ++++ b/migration/ram.c +> +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +> +version_id) +> +} +> +> +rcu_read_unlock(); +> +- DPRINTF("Completed load of VM with exit code %d seq iteration " +> ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " +> +"%" PRIu64 "\n", ret, seq_iter); +> +return ret; +> +} +> +diff --git a/migration/savevm.c b/migration/savevm.c +> +index 0ad1b93..3feaa61 100644 +> +--- a/migration/savevm.c +> ++++ b/migration/savevm.c +> +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) +> +> +} +> +> ++#include "exec/ram_addr.h" +> ++#include "qemu/rcu_queue.h" +> ++#include <clplumbing/md5.h> +> ++#ifndef MD5_DIGEST_LENGTH +> ++#define MD5_DIGEST_LENGTH 16 +> ++#endif +> ++ +> ++static void check_host_md5(void) +> ++{ +> ++ int i; +> ++ unsigned char md[MD5_DIGEST_LENGTH]; +> ++ rcu_read_lock(); +> ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +> +'pc.ram' block */ +> ++ rcu_read_unlock(); +> ++ +> ++ MD5(block->host, block->used_length, md); +> ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { +> ++ fprintf(stderr, "%02x", md[i]); +> ++ } +> ++ fprintf(stderr, "\n"); +> ++ error_report("end ram md5"); +> ++} +> ++ +> +void qemu_savevm_state_begin(QEMUFile *f, +> +const MigrationParams *params) +> +{ +> +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +> +bool iterable_only) +> +save_section_header(f, se, QEMU_VM_SECTION_END); +> +> +ret = se->ops->save_live_complete_precopy(f, se->opaque); +> ++ +> ++ fprintf(stderr, "after saving %s complete\n", se->idstr); +> ++ check_host_md5(); +> ++ +> +trace_savevm_section_end(se->idstr, se->section_id, ret); +> +save_section_footer(f, se); +> +if (ret < 0) { +> +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +> +MigrationIncomingState *mis) +> +section_id, le->se->idstr); +> +return ret; +> +} +> ++ if (section_type == QEMU_VM_SECTION_END) { +> ++ error_report("after loading state section id %d(%s)", +> ++ section_id, le->se->idstr); +> ++ check_host_md5(); +> ++ } +> +if (!check_section_footer(f, le)) { +> +return -EINVAL; +> +} +> +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) +> +} +> +> +cpu_synchronize_all_post_init(); +> ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); +> ++ check_host_md5(); +> +> +return ret; +> +} +> +> +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On 2015/12/3 17:24, Dr. David Alan Gilbert wrote: +* Li Zhijian (address@hidden) wrote: +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. +And now we found the same issue at the tcg vm(kvm is fine), after migration, +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off + +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 + +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. + +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. +Maybe one better way to do that is with the help of userfaultfd's write-protect +capability. It is still in the development by Andrea Arcangeli, but there +is a RFC version available, please refer to +http://www.spinics.net/lists/linux-mm/msg97422.html +(I'm developing live memory snapshot which based on it, maybe this is another +scene where we +can use userfaultfd's WP ;) ). +Dave +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) + } + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +bool iterable_only) + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) + section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); ++ check_host_md5(); + + return ret; + } +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +. + +On 12/03/2015 05:37 PM, Hailiang Zhang wrote: +On 2015/12/3 17:24, Dr. David Alan Gilbert wrote: +* Li Zhijian (address@hidden) wrote: +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. +And now we found the same issue at the tcg vm(kvm is fine), after +migration, +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 + +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off + +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 + +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 + +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after +cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. + +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. +Maybe one better way to do that is with the help of userfaultfd's +write-protect +capability. It is still in the development by Andrea Arcangeli, but there +is a RFC version available, please refer to +http://www.spinics.net/lists/linux-mm/msg97422.html +(I'm developing live memory snapshot which based on it, maybe this is +another scene where we +can use userfaultfd's WP ;) ). +sounds good. + +thanks +Li +Dave +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) + } + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq +iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void +qemu_savevm_state_complete_precopy(QEMUFile *f, +bool iterable_only) + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) + section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", +__func__); ++ check_host_md5(); + + return ret; + } +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +. +. +-- +Best regards. +Li Zhijian (8555) + +On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote: +* Li Zhijian (address@hidden) wrote: +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. +And now we found the same issue at the tcg vm(kvm is fine), after migration, +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off + +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 + +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. + +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. +I try to dump and compare them, more than 10 pages are different. +in source side, they are random value rather than always 'FF' 'FB' 'EF' +'BF'... in destination. +and not all of the different pages are continuous. + +thanks +Li +Dave +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) + } + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +bool iterable_only) + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) + section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); ++ check_host_md5(); + + return ret; + } +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + + +. +-- +Best regards. +Li Zhijian (8555) + +* Li Zhijian (address@hidden) wrote: +> +> +> +On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote: +> +>* Li Zhijian (address@hidden) wrote: +> +>>Hi all, +> +>> +> +>>Does anyboday remember the similar issue post by hailiang months ago +> +>> +http://patchwork.ozlabs.org/patch/454322/ +> +>>At least tow bugs about migration had been fixed since that. +> +> +> +>Yes, I wondered what happened to that. +> +> +> +>>And now we found the same issue at the tcg vm(kvm is fine), after migration, +> +>>the content VM's memory is inconsistent. +> +> +> +>Hmm, TCG only - I don't know much about that; but I guess something must +> +>be accessing memory without using the proper macros/functions so +> +>it doesn't mark it as dirty. +> +> +> +>>we add a patch to check memory content, you can find it from affix +> +>> +> +>>steps to reporduce: +> +>>1) apply the patch and re-build qemu +> +>>2) prepare the ubuntu guest and run memtest in grub. +> +>>soruce side: +> +>>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +>>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +>>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +>>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +>>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +>>pc-i440fx-2.3,accel=tcg,usb=off +> +>> +> +>>destination side: +> +>>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +>>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +>>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +>>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +>>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +>>pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +> +>> +> +>>3) start migration +> +>>with 1000M NIC, migration will finish within 3 min. +> +>> +> +>>at source: +> +>>(qemu) migrate tcp:192.168.2.66:8881 +> +>>after saving ram complete +> +>>e9e725df678d392b1a83b3a917f332bb +> +>>qemu-system-x86_64: end ram md5 +> +>>(qemu) +> +>> +> +>>at destination: +> +>>...skip... +> +>>Completed load of VM with exit code 0 seq iteration 1264 +> +>>Completed load of VM with exit code 0 seq iteration 1265 +> +>>Completed load of VM with exit code 0 seq iteration 1266 +> +>>qemu-system-x86_64: after loading state section id 2(ram) +> +>>49c2dac7bde0e5e22db7280dcb3824f9 +> +>>qemu-system-x86_64: end ram md5 +> +>>qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init +> +>> +> +>>49c2dac7bde0e5e22db7280dcb3824f9 +> +>>qemu-system-x86_64: end ram md5 +> +>> +> +>>This occurs occasionally and only at tcg machine. It seems that +> +>>some pages dirtied in source side don't transferred to destination. +> +>>This problem can be reproduced even if we disable virtio. +> +>> +> +>>Is it OK for some pages that not transferred to destination when do +> +>>migration ? Or is it a bug? +> +> +> +>I'm pretty sure that means it's a bug. Hard to find though, I guess +> +>at least memtest is smaller than a big OS. I think I'd dump the whole +> +>of memory on both sides, hexdump and diff them - I'd guess it would +> +>just be one byte/word different, maybe that would offer some idea what +> +>wrote it. +> +> +I try to dump and compare them, more than 10 pages are different. +> +in source side, they are random value rather than always 'FF' 'FB' 'EF' +> +'BF'... in destination. +> +> +and not all of the different pages are continuous. +I wonder if it happens on all of memtest's different test patterns, +perhaps it might be possible to narrow it down if you tell memtest +to only run one test at a time. + +Dave + +> +> +thanks +> +Li +> +> +> +> +> +>Dave +> +> +> +>>Any idea... +> +>> +> +>>=================md5 check patch============================= +> +>> +> +>>diff --git a/Makefile.target b/Makefile.target +> +>>index 962d004..e2cb8e9 100644 +> +>>--- a/Makefile.target +> +>>+++ b/Makefile.target +> +>>@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o +> +>> obj-y += memory_mapping.o +> +>> obj-y += dump.o +> +>> obj-y += migration/ram.o migration/savevm.o +> +>>-LIBS := $(libs_softmmu) $(LIBS) +> +>>+LIBS := $(libs_softmmu) $(LIBS) -lplumb +> +>> +> +>> # xen support +> +>> obj-$(CONFIG_XEN) += xen-common.o +> +>>diff --git a/migration/ram.c b/migration/ram.c +> +>>index 1eb155a..3b7a09d 100644 +> +>>--- a/migration/ram.c +> +>>+++ b/migration/ram.c +> +>>@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +> +>>version_id) +> +>> } +> +>> +> +>> rcu_read_unlock(); +> +>>- DPRINTF("Completed load of VM with exit code %d seq iteration " +> +>>+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " +> +>> "%" PRIu64 "\n", ret, seq_iter); +> +>> return ret; +> +>> } +> +>>diff --git a/migration/savevm.c b/migration/savevm.c +> +>>index 0ad1b93..3feaa61 100644 +> +>>--- a/migration/savevm.c +> +>>+++ b/migration/savevm.c +> +>>@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) +> +>> +> +>> } +> +>> +> +>>+#include "exec/ram_addr.h" +> +>>+#include "qemu/rcu_queue.h" +> +>>+#include <clplumbing/md5.h> +> +>>+#ifndef MD5_DIGEST_LENGTH +> +>>+#define MD5_DIGEST_LENGTH 16 +> +>>+#endif +> +>>+ +> +>>+static void check_host_md5(void) +> +>>+{ +> +>>+ int i; +> +>>+ unsigned char md[MD5_DIGEST_LENGTH]; +> +>>+ rcu_read_lock(); +> +>>+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +> +>>'pc.ram' block */ +> +>>+ rcu_read_unlock(); +> +>>+ +> +>>+ MD5(block->host, block->used_length, md); +> +>>+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { +> +>>+ fprintf(stderr, "%02x", md[i]); +> +>>+ } +> +>>+ fprintf(stderr, "\n"); +> +>>+ error_report("end ram md5"); +> +>>+} +> +>>+ +> +>> void qemu_savevm_state_begin(QEMUFile *f, +> +>> const MigrationParams *params) +> +>> { +> +>>@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +> +>>bool iterable_only) +> +>> save_section_header(f, se, QEMU_VM_SECTION_END); +> +>> +> +>> ret = se->ops->save_live_complete_precopy(f, se->opaque); +> +>>+ +> +>>+ fprintf(stderr, "after saving %s complete\n", se->idstr); +> +>>+ check_host_md5(); +> +>>+ +> +>> trace_savevm_section_end(se->idstr, se->section_id, ret); +> +>> save_section_footer(f, se); +> +>> if (ret < 0) { +> +>>@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +> +>>MigrationIncomingState *mis) +> +>> section_id, le->se->idstr); +> +>> return ret; +> +>> } +> +>>+ if (section_type == QEMU_VM_SECTION_END) { +> +>>+ error_report("after loading state section id %d(%s)", +> +>>+ section_id, le->se->idstr); +> +>>+ check_host_md5(); +> +>>+ } +> +>> if (!check_section_footer(f, le)) { +> +>> return -EINVAL; +> +>> } +> +>>@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) +> +>> } +> +>> +> +>> cpu_synchronize_all_post_init(); +> +>>+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); +> +>>+ check_host_md5(); +> +>> +> +>> return ret; +> +>> } +> +>> +> +>> +> +>> +> +>-- +> +>Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +> +> +> +> +>. +> +> +> +> +-- +> +Best regards. +> +Li Zhijian (8555) +> +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +Li Zhijian <address@hidden> wrote: +> +Hi all, +> +> +Does anyboday remember the similar issue post by hailiang months ago +> +http://patchwork.ozlabs.org/patch/454322/ +> +At least tow bugs about migration had been fixed since that. +> +> +And now we found the same issue at the tcg vm(kvm is fine), after +> +migration, the content VM's memory is inconsistent. +> +> +we add a patch to check memory content, you can find it from affix +> +> +steps to reporduce: +> +1) apply the patch and re-build qemu +> +2) prepare the ubuntu guest and run memtest in grub. +> +soruce side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off +> +> +destination side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +> +> +3) start migration +> +with 1000M NIC, migration will finish within 3 min. +> +> +at source: +> +(qemu) migrate tcp:192.168.2.66:8881 +> +after saving ram complete +> +e9e725df678d392b1a83b3a917f332bb +> +qemu-system-x86_64: end ram md5 +> +(qemu) +> +> +at destination: +> +...skip... +> +Completed load of VM with exit code 0 seq iteration 1264 +> +Completed load of VM with exit code 0 seq iteration 1265 +> +Completed load of VM with exit code 0 seq iteration 1266 +> +qemu-system-x86_64: after loading state section id 2(ram) +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init +> +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +> +This occurs occasionally and only at tcg machine. It seems that +> +some pages dirtied in source side don't transferred to destination. +> +This problem can be reproduced even if we disable virtio. +> +> +Is it OK for some pages that not transferred to destination when do +> +migration ? Or is it a bug? +> +> +Any idea... +Thanks for describing how to reproduce the bug. +If some pages are not transferred to destination then it is a bug, so we +need to know what the problem is, notice that the problem can be that +TCG is not marking dirty some page, that Migration code "forgets" about +that page, or anything eles altogether, that is what we need to find. + +There are more posibilities, I am not sure that memtest is on 32bit +mode, and it is inside posibility that we are missing some state when we +are on real mode. + +Will try to take a look at this. + +THanks, again. + + +> +> +=================md5 check patch============================= +> +> +diff --git a/Makefile.target b/Makefile.target +> +index 962d004..e2cb8e9 100644 +> +--- a/Makefile.target +> ++++ b/Makefile.target +> +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o +> +obj-y += memory_mapping.o +> +obj-y += dump.o +> +obj-y += migration/ram.o migration/savevm.o +> +-LIBS := $(libs_softmmu) $(LIBS) +> ++LIBS := $(libs_softmmu) $(LIBS) -lplumb +> +> +# xen support +> +obj-$(CONFIG_XEN) += xen-common.o +> +diff --git a/migration/ram.c b/migration/ram.c +> +index 1eb155a..3b7a09d 100644 +> +--- a/migration/ram.c +> ++++ b/migration/ram.c +> +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, +> +int version_id) +> +} +> +> +rcu_read_unlock(); +> +- DPRINTF("Completed load of VM with exit code %d seq iteration " +> ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " +> +"%" PRIu64 "\n", ret, seq_iter); +> +return ret; +> +} +> +diff --git a/migration/savevm.c b/migration/savevm.c +> +index 0ad1b93..3feaa61 100644 +> +--- a/migration/savevm.c +> ++++ b/migration/savevm.c +> +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) +> +> +} +> +> ++#include "exec/ram_addr.h" +> ++#include "qemu/rcu_queue.h" +> ++#include <clplumbing/md5.h> +> ++#ifndef MD5_DIGEST_LENGTH +> ++#define MD5_DIGEST_LENGTH 16 +> ++#endif +> ++ +> ++static void check_host_md5(void) +> ++{ +> ++ int i; +> ++ unsigned char md[MD5_DIGEST_LENGTH]; +> ++ rcu_read_lock(); +> ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +> +'pc.ram' block */ +> ++ rcu_read_unlock(); +> ++ +> ++ MD5(block->host, block->used_length, md); +> ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { +> ++ fprintf(stderr, "%02x", md[i]); +> ++ } +> ++ fprintf(stderr, "\n"); +> ++ error_report("end ram md5"); +> ++} +> ++ +> +void qemu_savevm_state_begin(QEMUFile *f, +> +const MigrationParams *params) +> +{ +> +@@ -1056,6 +1079,10 @@ void +> +qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only) +> +save_section_header(f, se, QEMU_VM_SECTION_END); +> +> +ret = se->ops->save_live_complete_precopy(f, se->opaque); +> ++ +> ++ fprintf(stderr, "after saving %s complete\n", se->idstr); +> ++ check_host_md5(); +> ++ +> +trace_savevm_section_end(se->idstr, se->section_id, ret); +> +save_section_footer(f, se); +> +if (ret < 0) { +> +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +> +MigrationIncomingState *mis) +> +section_id, le->se->idstr); +> +return ret; +> +} +> ++ if (section_type == QEMU_VM_SECTION_END) { +> ++ error_report("after loading state section id %d(%s)", +> ++ section_id, le->se->idstr); +> ++ check_host_md5(); +> ++ } +> +if (!check_section_footer(f, le)) { +> +return -EINVAL; +> +} +> +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) +> +} +> +> +cpu_synchronize_all_post_init(); +> ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); +> ++ check_host_md5(); +> +> +return ret; +> +} + +> +> +Thanks for describing how to reproduce the bug. +> +If some pages are not transferred to destination then it is a bug, so we need +> +to know what the problem is, notice that the problem can be that TCG is not +> +marking dirty some page, that Migration code "forgets" about that page, or +> +anything eles altogether, that is what we need to find. +> +> +There are more posibilities, I am not sure that memtest is on 32bit mode, and +> +it is inside posibility that we are missing some state when we are on real +> +mode. +> +> +Will try to take a look at this. +> +> +THanks, again. +> +Hi Juan & Amit + + Do you think we should add a mechanism to check the data integrity during LM +like Zhijian's patch did? it may be very helpful for developers. + Actually, I did the similar thing before in order to make sure that I did the +right thing we I change the code related to LM. + +Liang + +On (Fri) 04 Dec 2015 [01:43:07], Li, Liang Z wrote: +> +> +> +> Thanks for describing how to reproduce the bug. +> +> If some pages are not transferred to destination then it is a bug, so we +> +> need +> +> to know what the problem is, notice that the problem can be that TCG is not +> +> marking dirty some page, that Migration code "forgets" about that page, or +> +> anything eles altogether, that is what we need to find. +> +> +> +> There are more posibilities, I am not sure that memtest is on 32bit mode, +> +> and +> +> it is inside posibility that we are missing some state when we are on real +> +> mode. +> +> +> +> Will try to take a look at this. +> +> +> +> THanks, again. +> +> +> +> +Hi Juan & Amit +> +> +Do you think we should add a mechanism to check the data integrity during LM +> +like Zhijian's patch did? it may be very helpful for developers. +> +Actually, I did the similar thing before in order to make sure that I did +> +the right thing we I change the code related to LM. +If you mean for debugging, something that's not always on, then I'm +fine with it. + +A script that goes along that shows the result of comparison of the +diff will be helpful too, something that shows how many pages are +differnt, how many bytes in a page on average, and so on. + + Amit + diff --git a/results/classifier/108/other/744856 b/results/classifier/108/other/744856 new file mode 100644 index 000000000..e7b473b9c --- /dev/null +++ b/results/classifier/108/other/744856 @@ -0,0 +1,30 @@ +boot: 0.893 +KVM: 0.853 +performance: 0.838 +device: 0.826 +permissions: 0.719 +graphic: 0.669 +files: 0.626 +vnc: 0.615 +PID: 0.546 +semantic: 0.502 +other: 0.404 +network: 0.324 +debug: 0.317 +socket: 0.294 + +can't boot when using more than 6 disks since qemu-kvm-0.13 + +It's not possible to pass more than 6 disks to a guest since qemu-kvm-0.13 (also tested with 0.14). +If I pass more than 6 disks (as shown below) the machine complains that their is no bootable disk, + +The problem occurs with virtio and without virtio. + +eg. + +/usr/bin/qemu-system-x86_64 --enable-kvm -boot c -drive file=/dev/vgr5/fs-01,if=virtio -drive file=/dev/vgr5/fs-01_srv_workspace,if=virtio -drive file=/dev/vgr5/fs-01_srv_media,if=virtio -drive file=/dev/vgr5/fs-01_srv_company,if=virtio -drive file=/dev/vgr5/fs-01_srv_tmp,if=virtio -drive file=/dev/vgr5/fs-01_srv_download,if=virtio -drive file=/dev/vgr5/fs-01_srv_share,if=virtio -drive file=/dev/vgr5/fs-01_srv_backup,if=virtio -drive file=/dev/vgr5/fs-01_srv_private,if=virtio -drive file=/dev/vgr5/fs-01_srv_build,if=virtio -drive file=/dev/vgr5/fs-01_srv_dev,if=virtio -drive file=/dev/vgr5/fs-01_srv_backup2,if=virtio -drive file=/dev/vgr5/fs-01_srv_ftp,if=virtio -cpu qemu64 -smp 2 -m 4G -append root=/dev/vda -usbdevice tablet -net nic,macaddr=90:e6:ba:9d:00:0,model=e1000 -net tap,ifname=tap0,script=/usr/sbin/qemu-ifup,downscript=/usr/sbin/qemu-ifdown -monitor unix:/var/run/kvm/fs-01/monitor,server,nowait -pidfile /var/run/kvm/fs-01/pid -k de -kernel /srv/kvm/kernel/linux-2.6.38-gentoo -append root=/dev/vda -vnc :0 -name fs-01,process=fs-01 -vga std + +Triaging old bug tickets... QEMU 0.13 and 0.14 are pretty outdated nowadays, can you still reproduce this behavior with the latest version of QEMU? + +[Expired for QEMU because there has been no activity for 60 days.] + diff --git a/results/classifier/108/other/745 b/results/classifier/108/other/745 new file mode 100644 index 000000000..f42fcd67b --- /dev/null +++ b/results/classifier/108/other/745 @@ -0,0 +1,51 @@ +device: 0.738 +graphic: 0.707 +boot: 0.662 +vnc: 0.544 +performance: 0.543 +socket: 0.542 +files: 0.528 +KVM: 0.487 +semantic: 0.415 +PID: 0.349 +permissions: 0.337 +debug: 0.318 +network: 0.302 +other: 0.192 + +NVRAM is not persistent across coldboots without attached r/w FAT32 hard drive +Description of problem: +NVRAM variables are not persistent across coldboots without an attached readable / writable FAT32 hard drive. +Steps to reproduce: +Without hard drive: +1. Start VM as above ("without hard drive attached"), and enter EFI shell. +2. Dump the contents of a NVRAM variable, e.g. Lang. Note the contents. +3. Edit the contents of that variable. +4. Shutdown and restart the VM (cold reboot), and enter the EFI shell. +5. Dump the contents of the same NVRAM variable. The contents have reverted to what they were in Step 2. + +With hard drive: +1. Start VM as above ("with hard drive attached"), and enter EFI shell. +2. Navigate to the hard drive filesystem, e.g. FS0. +3. List the files in the filesystem. If NvVars exists, note the modification time. +4. Edit the contents of a NVRAM variable, e.g. Lang. +5. List the files of the filesystem. The NvVars file either now exists, or has notably been modified since Step 3. +Additional information: +OVMF blobs used: Those found in the Debian Sid package "ovmf_2021.11_rc1-1_all.deb" (https://packages.debian.org/sid/ovmf) + +Note that, without a hard drive attached, edited NVRAM variables persist across warm reboots, e.g. via the EFI shell command `reset`. + +I have not tested filesystem formats other than FAT32 with the attached hard drive, though I assume that would be futile due to the UEFI specification stating that EFI only supports FAT-based filesystems by default. + +Without HDD attached, before cold reboot: + + +Without HDD attached, after cold reboot: + + +With HDD attached (note modification date / time of NvVars): + + +This issue leads to modern macOS's installation process failing, as it relies on being able to modify NVRAM variables to know how far along in the installation process it is. Without these variables, the installation process will loop indefinitely, as it can't know when to move on to the next part of the overall process. + +Let me know if more information is needed, or if this is an issue better suited for the OVMF bug tracker (which I do not know the location of). diff --git a/results/classifier/108/other/74545755 b/results/classifier/108/other/74545755 new file mode 100644 index 000000000..85e70c209 --- /dev/null +++ b/results/classifier/108/other/74545755 @@ -0,0 +1,354 @@ +permissions: 0.770 +debug: 0.740 +performance: 0.721 +device: 0.720 +other: 0.683 +semantic: 0.669 +KVM: 0.661 +graphic: 0.660 +vnc: 0.650 +boot: 0.607 +files: 0.577 +network: 0.550 +socket: 0.549 +PID: 0.479 + +[Bug Report][RFC PATCH 0/1] block: fix failing assert on paused VM migration + +There's a bug (failing assert) which is reproduced during migration of +a paused VM. I am able to reproduce it on a stand with 2 nodes and a common +NFS share, with VM's disk on that share. + +root@fedora40-1-vm:~# virsh domblklist alma8-vm + Target Source +------------------------------------------ + sda /mnt/shared/images/alma8.qcow2 + +root@fedora40-1-vm:~# df -Th /mnt/shared +Filesystem Type Size Used Avail Use% Mounted on +127.0.0.1:/srv/nfsd nfs4 63G 16G 48G 25% /mnt/shared + +On the 1st node: + +root@fedora40-1-vm:~# virsh start alma8-vm ; virsh suspend alma8-vm +root@fedora40-1-vm:~# virsh migrate --compressed --p2p --persistent +--undefinesource --live alma8-vm qemu+ssh://fedora40-2-vm/system + +Then on the 2nd node: + +root@fedora40-2-vm:~# virsh migrate --compressed --p2p --persistent +--undefinesource --live alma8-vm qemu+ssh://fedora40-1-vm/system +error: operation failed: domain is not running + +root@fedora40-2-vm:~# tail -3 /var/log/libvirt/qemu/alma8-vm.log +2024-09-19 13:53:33.336+0000: initiating migration +qemu-system-x86_64: ../block.c:6976: int +bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & +BDRV_O_INACTIVE)' failed. +2024-09-19 13:53:42.991+0000: shutting down, reason=crashed + +Backtrace: + +(gdb) bt +#0 0x00007f7eaa2f1664 in __pthread_kill_implementation () at /lib64/libc.so.6 +#1 0x00007f7eaa298c4e in raise () at /lib64/libc.so.6 +#2 0x00007f7eaa280902 in abort () at /lib64/libc.so.6 +#3 0x00007f7eaa28081e in __assert_fail_base.cold () at /lib64/libc.so.6 +#4 0x00007f7eaa290d87 in __assert_fail () at /lib64/libc.so.6 +#5 0x0000563c38b95eb8 in bdrv_inactivate_recurse (bs=0x563c3b6c60c0) at +../block.c:6976 +#6 0x0000563c38b95aeb in bdrv_inactivate_all () at ../block.c:7038 +#7 0x0000563c3884d354 in qemu_savevm_state_complete_precopy_non_iterable +(f=0x563c3b700c20, in_postcopy=false, inactivate_disks=true) + at ../migration/savevm.c:1571 +#8 0x0000563c3884dc1a in qemu_savevm_state_complete_precopy (f=0x563c3b700c20, +iterable_only=false, inactivate_disks=true) at ../migration/savevm.c:1631 +#9 0x0000563c3883a340 in migration_completion_precopy (s=0x563c3b4d51f0, +current_active_state=<optimized out>) at ../migration/migration.c:2780 +#10 migration_completion (s=0x563c3b4d51f0) at ../migration/migration.c:2844 +#11 migration_iteration_run (s=0x563c3b4d51f0) at ../migration/migration.c:3270 +#12 migration_thread (opaque=0x563c3b4d51f0) at ../migration/migration.c:3536 +#13 0x0000563c38dbcf14 in qemu_thread_start (args=0x563c3c2d5bf0) at +../util/qemu-thread-posix.c:541 +#14 0x00007f7eaa2ef6d7 in start_thread () at /lib64/libc.so.6 +#15 0x00007f7eaa373414 in clone () at /lib64/libc.so.6 + +What happens here is that after 1st migration BDS related to HDD remains +inactive as VM is still paused. Then when we initiate 2nd migration, +bdrv_inactivate_all() leads to the attempt to set BDRV_O_INACTIVE flag +on that node which is already set, thus assert fails. + +Attached patch which simply skips setting flag if it's already set is more +of a kludge than a clean solution. Should we use more sophisticated logic +which allows some of the nodes be in inactive state prior to the migration, +and takes them into account during bdrv_inactivate_all()? Comments would +be appreciated. + +Andrey + +Andrey Drobyshev (1): + block: do not fail when inactivating node which is inactive + + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +-- +2.39.3 + +Instead of throwing an assert let's just ignore that flag is already set +and return. We assume that it's going to be safe to ignore. Otherwise +this assert fails when migrating a paused VM back and forth. + +Ideally we'd like to have a more sophisticated solution, e.g. not even +scan the nodes which should be inactive at this point. + +Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> +--- + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +diff --git a/block.c b/block.c +index 7d90007cae..c1dcf906d1 100644 +--- a/block.c ++++ b/block.c +@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK +bdrv_inactivate_recurse(BlockDriverState *bs) + return 0; + } + +- assert(!(bs->open_flags & BDRV_O_INACTIVE)); ++ if (bs->open_flags & BDRV_O_INACTIVE) { ++ /* ++ * Return here instead of throwing assert as a workaround to ++ * prevent failure on migrating paused VM. ++ * Here we assume that if we're trying to inactivate BDS that's ++ * already inactive, it's safe to just ignore it. ++ */ ++ return 0; ++ } + + /* Inactivate this node */ + if (bs->drv->bdrv_inactivate) { +-- +2.39.3 + +[add migration maintainers] + +On 24.09.24 15:56, Andrey Drobyshev wrote: +Instead of throwing an assert let's just ignore that flag is already set +and return. We assume that it's going to be safe to ignore. Otherwise +this assert fails when migrating a paused VM back and forth. + +Ideally we'd like to have a more sophisticated solution, e.g. not even +scan the nodes which should be inactive at this point. + +Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> +--- + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +diff --git a/block.c b/block.c +index 7d90007cae..c1dcf906d1 100644 +--- a/block.c ++++ b/block.c +@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK +bdrv_inactivate_recurse(BlockDriverState *bs) + return 0; + } +- assert(!(bs->open_flags & BDRV_O_INACTIVE)); ++ if (bs->open_flags & BDRV_O_INACTIVE) { ++ /* ++ * Return here instead of throwing assert as a workaround to ++ * prevent failure on migrating paused VM. ++ * Here we assume that if we're trying to inactivate BDS that's ++ * already inactive, it's safe to just ignore it. ++ */ ++ return 0; ++ } +/* Inactivate this node */ +if (bs->drv->bdrv_inactivate) { +I doubt that this a correct way to go. + +As far as I understand, "inactive" actually means that "storage is not belong to +qemu, but to someone else (another qemu process for example), and may be changed +transparently". In turn this means that Qemu should do nothing with inactive disks. So the +problem is that nobody called bdrv_activate_all on target, and we shouldn't ignore that. + +Hmm, I see in process_incoming_migration_bh() we do call bdrv_activate_all(), +but only in some scenarios. May be, the condition should be less strict here. + +Why we need any condition here at all? Don't we want to activate block-layer on +target after migration anyway? + +-- +Best regards, +Vladimir + +On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote: +> +[add migration maintainers] +> +> +On 24.09.24 15:56, Andrey Drobyshev wrote: +> +> [...] +> +> +I doubt that this a correct way to go. +> +> +As far as I understand, "inactive" actually means that "storage is not +> +belong to qemu, but to someone else (another qemu process for example), +> +and may be changed transparently". In turn this means that Qemu should +> +do nothing with inactive disks. So the problem is that nobody called +> +bdrv_activate_all on target, and we shouldn't ignore that. +> +> +Hmm, I see in process_incoming_migration_bh() we do call +> +bdrv_activate_all(), but only in some scenarios. May be, the condition +> +should be less strict here. +> +> +Why we need any condition here at all? Don't we want to activate +> +block-layer on target after migration anyway? +> +Hmm I'm not sure about the unconditional activation, since we at least +have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it +in such a case). In current libvirt upstream I see such code: + +> +/* Migration capabilities which should always be enabled as long as they +> +> +* are supported by QEMU. If the capability is supposed to be enabled on both +> +> +* sides of migration, it won't be enabled unless both sides support it. +> +> +*/ +> +> +static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] = +> +{ +> +> +{QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER, +> +> +QEMU_MIGRATION_SOURCE}, +> +> +> +> +{QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE, +> +> +QEMU_MIGRATION_DESTINATION}, +> +> +}; +which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set. + +The code from process_incoming_migration_bh() you're referring to: + +> +/* If capability late_block_activate is set: +> +> +* Only fire up the block code now if we're going to restart the +> +> +* VM, else 'cont' will do it. +> +> +* This causes file locking to happen; so we don't want it to happen +> +> +* unless we really are starting the VM. +> +> +*/ +> +> +if (!migrate_late_block_activate() || +> +> +(autostart && (!global_state_received() || +> +> +runstate_is_live(global_state_get_runstate())))) { +> +> +/* Make sure all file formats throw away their mutable metadata. +> +> +> +* If we get an error here, just don't restart the VM yet. */ +> +> +bdrv_activate_all(&local_err); +> +> +if (local_err) { +> +> +error_report_err(local_err); +> +> +local_err = NULL; +> +> +autostart = false; +> +> +} +> +> +} +It states explicitly that we're either going to start VM right at this +point if (autostart == true), or we wait till "cont" command happens. +None of this is going to happen if we start another migration while +still being in PAUSED state. So I think it seems reasonable to take +such case into account. For instance, this patch does prevent the crash: + +> +diff --git a/migration/migration.c b/migration/migration.c +> +index ae2be31557..3222f6745b 100644 +> +--- a/migration/migration.c +> ++++ b/migration/migration.c +> +@@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque) +> +*/ +> +if (!migrate_late_block_activate() || +> +(autostart && (!global_state_received() || +> +- runstate_is_live(global_state_get_runstate())))) { +> ++ runstate_is_live(global_state_get_runstate()))) || +> ++ (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) { +> +/* Make sure all file formats throw away their mutable metadata. +> +* If we get an error here, just don't restart the VM yet. */ +> +bdrv_activate_all(&local_err); +What are your thoughts on it? + +Andrey + diff --git a/results/classifier/108/other/746 b/results/classifier/108/other/746 new file mode 100644 index 000000000..f6de0a6f6 --- /dev/null +++ b/results/classifier/108/other/746 @@ -0,0 +1,16 @@ +files: 0.890 +other: 0.799 +semantic: 0.795 +vnc: 0.765 +device: 0.762 +network: 0.732 +debug: 0.692 +socket: 0.686 +graphic: 0.622 +permissions: 0.576 +boot: 0.551 +KVM: 0.536 +performance: 0.534 +PID: 0.264 + +Current file VERSION of tag 6.2.0-rc2 contains 6.2.92, not 6.1.92 diff --git a/results/classifier/108/other/747 b/results/classifier/108/other/747 new file mode 100644 index 000000000..fdd3f419a --- /dev/null +++ b/results/classifier/108/other/747 @@ -0,0 +1,45 @@ +performance: 0.786 +device: 0.668 +boot: 0.666 +graphic: 0.623 +debug: 0.479 +PID: 0.455 +network: 0.431 +semantic: 0.381 +permissions: 0.289 +files: 0.261 +socket: 0.261 +vnc: 0.252 +KVM: 0.182 +other: 0.182 + +hvf-accelerated aarch64 hangs when switching to big endian mode +Description of problem: +Trying to boot a big endian Linux kernel using the above command line on an M1 Mac Mini just hangs, there is not a single output. However, by replacing `hvf` with `tcg`, the kernel boots up fine. The kernel also starts if I use KVM acceleration on a Linux host system. +Steps to reproduce: +1. Build a Linux kernel for big endian arm64 +2. Try to boot it with -accel hvf on an M1 Mac +3. Observe a lot of nothing happening :-) +Additional information: +Sample run, TCG vs HVF +``` +mikan:/tmp% qemu-system-aarch64 -accel tcg -machine virt,highmem=off -cpu cortex-a72 -nographic -kernel /tmp/vmlinuz-5.10.76-gentoo-r1-arm64.be |& head -16 +[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083] +[ 0.000000] Linux version 5.10.76-gentoo-r1-arm64 (root@localhost) (aarch64-unknown-linux-gnu-gcc (Gentoo 11.2.0 p1) 11.2.0, GNU ld (Gentoo 2.37_p1 p0) 2.37) #1 SMP Sun Nov 21 16:30:21 -00 2021 +[ 0.000000] Machine model: linux,dummy-virt +[ 0.000000] NUMA: No NUMA configuration found +[ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x0000000047ffffff] +[ 0.000000] NUMA: NODE_DATA [mem 0x47f65300-0x47f76fff] +[ 0.000000] Zone ranges: +[ 0.000000] DMA [mem 0x0000000040000000-0x0000000047ffffff] +[ 0.000000] DMA32 empty +[ 0.000000] Normal empty +[ 0.000000] Movable zone start for each node +[ 0.000000] Early memory node ranges +[ 0.000000] node 0: [mem 0x0000000040000000-0x0000000047ffffff] +[ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x0000000047ffffff] +[ 0.000000] psci: probing for conduit method from DT. +[ 0.000000] psci: PSCIv0.2 detected in firmware. +mikan:/tmp% qemu-system-aarch64 -accel hvf -machine virt,highmem=off -cpu cortex-a72 -nographic -kernel /tmp/vmlinuz-5.10.76-gentoo-r1-arm64.be +``` +(followed by tumbleweeds) diff --git a/results/classifier/108/other/747583 b/results/classifier/108/other/747583 new file mode 100644 index 000000000..2f42d60ec --- /dev/null +++ b/results/classifier/108/other/747583 @@ -0,0 +1,42 @@ +KVM: 0.726 +device: 0.628 +performance: 0.539 +graphic: 0.494 +network: 0.490 +semantic: 0.468 +other: 0.451 +debug: 0.423 +socket: 0.271 +permissions: 0.265 +vnc: 0.259 +PID: 0.241 +files: 0.162 +boot: 0.161 + +Windows 2008 Time Zone Change Even When Using -locatime + +* What cpu model : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz +* What kvm version you are using. : qemu-kvm-0.12.3 +* The host kernel version : 2.6.32-30-server +* What host kernel arch you are using (i386 or x86_64) : x86_64 +* What guest you are using, including OS type: Windows 2008 Enterprise x86_64 +* The qemu command line you are using to start the guest : /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp 1 -name 2-6176 -uuid 4d1d56b1-d0b7-506b-31a5-a87c8cb0560b -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/2-6176.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/dev/disk/by-id/scsi-3600144f05c11090000004d9602950073,if=virtio,index=0,boot=on,format=raw -drive file=/dev/disk/by-id/scsi-3600144f0eae8810000004c7bb0920037,if=ide,media=cdrom,index=2,format=raw -net nic,macaddr=00:00:d1:d0:3f:5e,vlan=0,name=nic.1 -net tap,fd=212,vlan=0,name=tap.1 -net nic,macaddr=00:00:0a:d0:3f:5e,vlan=1,name=nic.1 -net tap,fd=213,vlan=1,name=tap.1 -chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -usbdevice tablet -vnc 0.0.0.0:394,password -k en-us -vga cirrus +* Whether the problem goes away if using the -no-kvm-irqchip or -no-kvm-pit switch. : Unable to test +* Whether the problem also appears with the -no-kvm switch. : Unable to test + +Host time zone: EDT Guest time zone: PDT + +Steps to reproduce: +1) Set time zone to (GMT-08:00) Pacific Time (US & Canada) on guest +2) Power off Windows 2008 Enterprise x86_64 guest completely. Ensure the kvm process exits. +3) Power on Windows 2008 Enterprise x86_64 guest using virsh start <domain> +4) Server will show EDT time but have the time zone still set to (GMT-08:00) Pacific Time (US & Canada). + +Syncing the time after stopping and starting the kvm process using Windows "Internet Time" ntp time sync will sync the time to the correct PDT time. + +Doing a reboot from within the guest's operating system where kvm does not exit will not cause the timezone shift to happen. + +QEMU 0.12 is completely outdated nowadays ... can you still reproduce this issue with the latest version of QEMU? + +[Expired for QEMU because there has been no activity for 60 days.] + diff --git a/results/classifier/108/other/748 b/results/classifier/108/other/748 new file mode 100644 index 000000000..9b4d765ee --- /dev/null +++ b/results/classifier/108/other/748 @@ -0,0 +1,16 @@ +KVM: 0.888 +device: 0.807 +network: 0.763 +performance: 0.748 +vnc: 0.526 +PID: 0.514 +boot: 0.480 +graphic: 0.437 +socket: 0.403 +files: 0.398 +other: 0.380 +semantic: 0.327 +debug: 0.289 +permissions: 0.219 + +Enable postcopy migration for mixed Hugepage backed KVM guests and improve handling of dirty-page tracking by QEMU/KVM diff --git a/results/classifier/108/other/749 b/results/classifier/108/other/749 new file mode 100644 index 000000000..67881e1a3 --- /dev/null +++ b/results/classifier/108/other/749 @@ -0,0 +1,16 @@ +device: 0.806 +performance: 0.714 +network: 0.567 +boot: 0.302 +graphic: 0.228 +socket: 0.190 +semantic: 0.185 +vnc: 0.158 +permissions: 0.138 +PID: 0.111 +other: 0.081 +files: 0.041 +debug: 0.036 +KVM: 0.002 + +Enhance QEMU live patching |