virsh migration copy-storage-all fails with "Unable to read from monitor: Connection reset by peer"
virsh migration copy-storage-all fails with "Unable to read from monitor: Connection reset by peer" and shut downs the guest on the source host.
Kernel Version: 3.10.0-rc5+
Libvirt Version: 1.0.6
Qemu Version: 1.5.50
Steps to reproduce the issue:
----------------------------------------
1. Created the qemu-img create -f qcow2 vm.qcow2 11G on the destination host which is same as the source.
2. Started the guest on the source
3. Started the vncdisplay to monitor the guest
4. Initiated the migration "virsh migrate --live VM1 qemu+ssh://host-ip/system tcp://host-ip --verbose --copy-storage-all"
5. It started the copying the storage from souce to destination (conitinously monitored it was growing)
6. Guest on the destination was paused and was running on the source
7. At some point the VM on the source got shutdown and migration failed with "Unable to read from monitor: Connection reset by peer"
Attached the libvirt debug logs.
The debug logs shows :
2013-06-19 08:43:12.253+0000: 4026: debug : virEventPollInterruptLocked:716 : Interrupting
2013-06-19 08:43:12.253+0000: 4026: debug : virEventPollAddTimeout:248 : EVENT_POLL_ADD_TIMEOUT: timer=1 frequency=0 cb=0x7fe930baa960 opaque=(nil) ff=(nil)
Note: The virsh live migration works fine with nfs storage from source to destination and vice versa.
With libvirt 1.0.5 and qemu 1.5 also we were facing the same issue, but with that even "Live migration with nfs also was not working".
Guest XML:
----------------
VM147feb0e1-0c23-9be9-da12-2ead34864de2409600020480001hvmdestroyrestartrestart/usr/local/bin/qemu-system-x86_64
Thanks for submitting this bug. I can' t reproduce it with an empty image (sitting at cd boot menu).
What is the underlying fs (/home/images) on both source and destination?
Could you run 'apport-collect 1192499' on the destination host?
Oh, actually I notice you have
/usr/local/bin/qemu-system-x86_64
listed as the emulator in the .xml. That is not the qemu-system-x86_64 shipped with the qemu-system-x86 package. Does switching that for
/usr/bin/qemu-system-x86_64
and see if that helps matters?
We tried with /usr/bin/qemu-system-x86_64 and we are facing the same issue.
For both source and destination the fs is ext4.
We are using fedora base, hence I have attached the sosreport of both source and destination.
On 06/29/2013 12:15 AM, Serge Hallyn wrote:
> Thanks for submitting this bug. I can' t reproduce it with an empty
> image (sitting at cd boot menu).
>
> What is the underlying fs (/home/images) on both source and destination?
For both source and destination the fs is ext4.
>
> Could you run 'apport-collect 1192499' on the destination host?
We are using fedora base, hence I have attached the sosreport of both
source and destination in the bug.
>
> Oh, actually I notice you have
>
> /usr/local/bin/qemu-system-x86_64
>
> listed as the emulator in the .xml. That is not the qemu-system-x86_64
> shipped with the qemu-system-x86 package. Does switching that for
>
> /usr/bin/qemu-system-x86_64
We tried with /usr/bin/qemu-system-x86_64 and we
are facing the same issue.
>
> and see if that helps matters?
>
> ** Changed in: libvirt (Ubuntu)
> Status: New => Incomplete
>
Thanks,
Shastri
Hi,
I'm confused - what do you mean by you are using a fedora base? Are you talking about the VM, or the destination host? If the destination host, then (a) is it identical to the source host, and either way (b) exactly how did you set up the hosts with ubuntu packages?
I am sorry.
Host Kernel : 3.10.0-rc5+ [upstream kernel fedora base], both the source and destination host are the same.
We test the upstream kernel, qemu and libvirt. We don't have ubuntu.
> I am sorry.
>
> Host Kernel : 3.10.0-rc5+ [upstream kernel fedora base], both the source
> and destination host are the same.
>
> We test the upstream kernel, qemu and libvirt. We don't have ubuntu.
Oh, I see - then please see http://libvirt.org/bugs.html (specifically
"General libvirt bug reports") for where to file upstream bug reports.
Moving to qemu component as qemu is crashing based on the inputs from Michal Privoznik
Bugzilla : Bug 979411 - virsh migration copy-storage-all fails with "Unable to read from monitor: Connection reset by peer"
The destination VM's log says:
qemu: warning: error while loading state section id 1
This indicates that either there was an error on the destination while loading state or the migration stream got out of sync.
Please check that QEMU on source and destination are identical. If you are running different versions of QEMU on source and destination this could be the cause.
Try with qemu.git/master and a simple QEMU command-line (without libvirt):
qemu-system-x86_64 -machine pc-i440fx-1.5,accel=kvm -m 4000 \
-drive file=/home/images/rhel64-64.qcow2,if=ide,format=qcow2,cache=none
Use the same command-line on the destination but also add "-incoming tcp::1234". To start the migrate on the source, run "migrate -b tcp::1234" in the QEMU monitor.
If the failure can be reproduce on qemu.git/master in this way it will be easier to debug.
I will be away next week and therefore unable to look into this more.
Thanks, I ll work on this and update it.
On 07/05/2013 01:52 PM, Stefan Hajnoczi wrote:
> The destination VM's log says:
>
> qemu: warning: error while loading state section id 1
>
> This indicates that either there was an error on the destination while
> loading state or the migration stream got out of sync.
>
> Please check that QEMU on source and destination are identical. If you
> are running different versions of QEMU on source and destination this
> could be the cause.
>
> Try with qemu.git/master and a simple QEMU command-line (without
> libvirt):
>
> qemu-system-x86_64 -machine pc-i440fx-1.5,accel=kvm -m 4000 \
> -drive file=/home/images/rhel64-64.qcow2,if=ide,format=qcow2,cache=none
>
> Use the same command-line on the destination but also add "-incoming
> tcp::1234". To start the migrate on the source, run "migrate -b tcp
> ::1234" in the QEMU monitor.
I tried this "migrate -b tcp ::1234" comes out and gives
me the qemu prompt [ I mean it doesn't wait till the migration
completes] and also on the destination the image is not growing, the
status shows paused.
~Shastri
>
> If the failure can be reproduce on qemu.git/master in this way it will
> be easier to debug.
>
> I will be away next week and therefore unable to look into this more.
>
> "migrate -b tcp ::1234"
There should be no space between tcp and the rest of the connection details:
migrate -b tcp::1234
Is there still something left to do here, or could we close this ticket nowadays?
There hasn't been a reply to my question in the last comment within
months, so I assume nobody cares about this anymore. So I'm closing this
ticket now...
Created attachment 766573
Source-migrate-logs
Description of problem:
virsh migration copy-storage-all fails with "Unable to read from monitor: Connection reset by peer" and shut downs the guest on the source host.
Kernel Version: 3.10.0-rc5+
Libvirt Version: 1.0.6
Qemu Version: 1.5.50
Steps to Reproduce:
1. Created the qemu-img create -f qcow2 vm.qcow2 11G on the destination host which is same as the source.
2. Started the guest on the source
3. Started the vncdisplay to monitor the guest
4. Initiated the migration "virsh migrate --live VM1 qemu+ssh://host-ip/system tcp://host-ip --verbose --copy-storage-all"
5. It started the copying the storage from souce to destination (conitinously monitored it was growing)
6. Guest on the destination was paused and was running on the source
7. At some point the VM on the source got shutdown and migration failed with "Unable to read from monitor: Connection reset by peer"
Attached the libvirt debug logs.
The debug logs shows :
2013-06-19 08:43:12.253+0000: 4026: debug : virEventPollInterruptLocked:716 : Interrupting
2013-06-19 08:43:12.253+0000: 4026: debug : virEventPollAddTimeout:248 : EVENT_POLL_ADD_TIMEOUT: timer=1 frequency=0 cb=0x7fe930baa960 opaque=(nil) ff=(nil)
Note: The virsh live migration works fine with nfs storage from source to destination and vice versa.
With libvirt 1.0.5 and qemu 1.5 also we were facing the same issue, but with that even "Live migration with nfs also was not working".
Guest XML:
----------------
VM147feb0e1-0c23-9be9-da12-2ead34864de2409600020480001hvmdestroyrestartrestart/usr/local/bin/qemu-system-x86_64
Can you provide the destination debug logs? Esp. content of /var/log/libvirt/qemu/VM1.log as there's supposed to be the reason why qemu died.
Created attachment 766578
Source Logs
Created attachment 766579
Destination Logs
From the VM1_dest.log:
...
Completed 97 %
Completed 98 %
Completed 99 %
qemu: warning: error while loading state section id 1
load of migration failed
So the qemu is unable to migrate itself. Therefore I think this is actually a qemu bug.
On the other hand, I wonder why the guest on the source is shut down. There's no sign of that in the logs.
When the migration fails the guest gets shutdown on the source.
[root@9 images]# virsh list --all
Id Name State
----------------------------------------------------
5 VM1 running
[root@9 images]# virsh migrate --live rhel64-64 qemu+ssh:/IP/system tcp://IP --verbose --copy-storage-all
Migration: [ 93 %]error: Unable to read from monitor: Connection reset by peer
At the destination:
[root@9 images]# virsh list --all
Id Name State
----------------------------------------------------
- VM1 shut off
[root@destination]# virsh list --all
Id Name State
----------------------------------------------------
16 VM1 paused
[root@destination]# virsh list --all
Id Name State
----------------------------------------------------
Attached the SOS report of the Source and Destination machines.
Created attachment 767310
source sos
Created attachment 767311
Destination sos
Unfortunately, the libvirtd.log is missing. I've written some guide as well:
http://wiki.libvirt.org/page/DebugLogs
Please set the correct debug logs, reproduce and attach the new reports.
Attached libvirtd logs of both Source and Destination
Created attachment 767340
source libvirtd log
Created attachment 767342
Destination libvird logs
From the *source* libvirtd log:
2013-07-01 11:30:29.582+0000: 3164: debug : virObjectRef:297 : OBJECT_REF: obj=0x7fe97000cb00
2013-07-01 11:30:29.582+0000: 3164: error : qemuMonitorIORead:511 : Unable to read from monitor: Connection reset by peer
2013-07-01 11:30:29.582+0000: 3164: debug : qemuMonitorIO:644 : Error on monitor Unable to read from monitor: Connection reset by peer
2013-07-01 11:30:29.582+0000: 3164: debug : virObjectUnref:260 : OBJECT_UNREF: obj=0x7fe97000cb00
2013-07-01 11:30:29.582+0000: 3165: debug : qemuMonitorSend:905 : Send command resulted in error Unable to read from monitor: Connection reset by peer
2013-07-01 11:30:29.582+0000: 3164: debug : qemuMonitorIO:678 : Triggering error callback
2013-07-01 11:30:29.582+0000: 3164: debug : qemuProcessHandleMonitorError:351 : Received error on 0x7fe9881337f0 'rhel64-64'
This means, qemu died unexpectedly. The qemu error message should be in /var/log/libvirt/qemu/rhel64-64.log on the source.
Since this is a qemu bug (probably fixed already) I'm closing this one. If you disagree please reopen.