From 8d60280e4f1621e13aea4aa593b5bc3e2af34e9d Mon Sep 17 00:00:00 2001 From: Fabiano Rosas Date: Mon, 17 Jun 2024 15:57:30 -0300 Subject: migration: Add documentation for fdset with multifd + file With the last few changes to the fdset infrastructure, we now allow multifd to use an fdset when migrating to a file. This is useful for the scenario where the management layer wants to have control over the migration file. By receiving the file descriptors directly, QEMU can delegate some high level operating system operations to the management layer (such as mandatory access control). The management layer might also want to add its own headers before the migration stream. Document the "file:/dev/fdset/#" syntax for the multifd migration with mapped-ram. The requirements for the fdset mechanism are: - the fdset must contain two fds that are not duplicates between themselves; - if direct-io is to be used, exactly one of the fds must have the O_DIRECT flag set; - the file must be opened with WRONLY on the migration source side; - the file must be opened with RDONLY on the migration destination side. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- docs/devel/migration/main.rst | 24 +++++++++++++++++++----- docs/devel/migration/mapped-ram.rst | 6 +++++- 2 files changed, 24 insertions(+), 6 deletions(-) (limited to 'docs/devel/migration') diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 495cdcb112..784c899dca 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -47,11 +47,25 @@ over any transport. QEMU interference. Note that QEMU does not flush cached file data/metadata at the end of migration. -In addition, support is included for migration using RDMA, which -transports the page data using ``RDMA``, where the hardware takes care of -transporting the pages, and the load on the CPU is much lower. While the -internals of RDMA migration are a bit different, this isn't really visible -outside the RAM migration code. + The file migration also supports using a file that has already been + opened. A set of file descriptors is passed to QEMU via an "fdset" + (see add-fd QMP command documentation). This method allows a + management application to have control over the migration file + opening operation. There are, however, strict requirements to this + interface if the multifd capability is enabled: + + - the fdset must contain two file descriptors that are not + duplicates between themselves; + - if the direct-io capability is to be used, exactly one of the + file descriptors must have the O_DIRECT flag set; + - the file must be opened with WRONLY on the migration source side + and RDONLY on the migration destination side. + +- rdma migration: support is included for migration using RDMA, which + transports the page data using ``RDMA``, where the hardware takes + care of transporting the pages, and the load on the CPU is much + lower. While the internals of RDMA migration are a bit different, + this isn't really visible outside the RAM migration code. All these migration protocols use the same infrastructure to save/restore state devices. This infrastructure is shared with the diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst index fa4cefd9fc..d352b546e9 100644 --- a/docs/devel/migration/mapped-ram.rst +++ b/docs/devel/migration/mapped-ram.rst @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a sequential stream. Having the pages at fixed offsets also allows the usage of O_DIRECT for save/restore of the migration stream as the pages are ensured to be written respecting O_DIRECT alignment -restrictions (direct-io support not yet implemented). +restrictions. Usage ----- @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration: Mapped-ram migration is best done non-live, i.e. by stopping the VM on the source side before migrating. +For best performance enable the ``direct-io`` parameter as well: + + ``migrate_set_parameter direct-io on`` + Use-cases --------- -- cgit 1.4.1 From 21e89f7ad526f0dddfc722e615bfb0fcdb705c87 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Wed, 19 Jun 2024 18:30:41 -0400 Subject: migration/docs: Update postcopy recover session for SETUP phase Firstly, the "Paused" state was added in the wrong place before. The state machine section was describing PostcopyState, rather than MigrationStatus. Drop the Paused state descriptions. Then in the postcopy recover session, add more information on the state machine for MigrationStatus in the lines. Add the new RECOVER_SETUP phase. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu [fix typo s/reconnects/reconnect] Signed-off-by: Fabiano Rosas --- docs/devel/migration/postcopy.rst | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) (limited to 'docs/devel/migration') diff --git a/docs/devel/migration/postcopy.rst b/docs/devel/migration/postcopy.rst index 6c51e96d79..82e7a848c6 100644 --- a/docs/devel/migration/postcopy.rst +++ b/docs/devel/migration/postcopy.rst @@ -99,17 +99,6 @@ ADVISE->DISCARD->LISTEN->RUNNING->END (although it can't do the cleanup it would do as it finishes a normal migration). - - Paused - - Postcopy can run into a paused state (normally on both sides when - happens), where all threads will be temporarily halted mostly due to - network errors. When reaching paused state, migration will make sure - the qemu binary on both sides maintain the data without corrupting - the VM. To continue the migration, the admin needs to fix the - migration channel using the QMP command 'migrate-recover' on the - destination node, then resume the migration using QMP command 'migrate' - again on source node, with resume=true flag set. - - End The listen thread can now quit, and perform the cleanup of migration @@ -221,7 +210,8 @@ paused postcopy migration. The recovery phase normally contains a few steps: - - When network issue occurs, both QEMU will go into PAUSED state + - When network issue occurs, both QEMU will go into **POSTCOPY_PAUSED** + migration state. - When the network is recovered (or a new network is provided), the admin can setup the new channel for migration using QMP command @@ -229,9 +219,20 @@ The recovery phase normally contains a few steps: - On source host, the admin can continue the interrupted postcopy migration using QMP command 'migrate' with resume=true flag set. - - - After the connection is re-established, QEMU will continue the postcopy - migration on both sides. + Source QEMU will go into **POSTCOPY_RECOVER_SETUP** state trying to + re-establish the channels. + + - When both sides of QEMU successfully reconnect using a new or fixed up + channel, they will go into **POSTCOPY_RECOVER** state, some handshake + procedure will be needed to properly synchronize the VM states between + the two QEMUs to continue the postcopy migration. For example, there + can be pages sent right during the window when the network is + interrupted, then the handshake will guarantee pages lost in-flight + will be resent again. + + - After a proper handshake synchronization, QEMU will continue the + postcopy migration on both sides and go back to **POSTCOPY_ACTIVE** + state. Postcopy migration will continue. During a paused postcopy migration, the VM can logically still continue running, and it will not be impacted from any page access to pages that -- cgit 1.4.1