summary refs log tree commit diff stats
path: root/docs/specs/vhost-user.txt
diff options
context:
space:
mode:
Diffstat (limited to 'docs/specs/vhost-user.txt')
-rw-r--r--docs/specs/vhost-user.txt620
1 files changed, 0 insertions, 620 deletions
diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
deleted file mode 100644
index 481ab56e35..0000000000
--- a/docs/specs/vhost-user.txt
+++ /dev/null
@@ -1,620 +0,0 @@
-Vhost-user Protocol
-===================
-
-Copyright (c) 2014 Virtual Open Systems Sarl.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-===================
-
-This protocol is aiming to complement the ioctl interface used to control the
-vhost implementation in the Linux kernel. It implements the control plane needed
-to establish virtqueue sharing with a user space process on the same host. It
-uses communication over a Unix domain socket to share file descriptors in the
-ancillary data of the message.
-
-The protocol defines 2 sides of the communication, master and slave. Master is
-the application that shares its virtqueues, in our case QEMU. Slave is the
-consumer of the virtqueues.
-
-In the current implementation QEMU is the Master, and the Slave is intended to
-be a software Ethernet switch running in user space, such as Snabbswitch.
-
-Master and slave can be either a client (i.e. connecting) or server (listening)
-in the socket communication.
-
-Message Specification
----------------------
-
-Note that all numbers are in the machine native byte order. A vhost-user message
-consists of 3 header fields and a payload:
-
-------------------------------------
-| request | flags | size | payload |
-------------------------------------
-
- * Request: 32-bit type of the request
- * Flags: 32-bit bit field:
-   - Lower 2 bits are the version (currently 0x01)
-   - Bit 2 is the reply flag - needs to be sent on each reply from the slave
-   - Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for
-     details.
- * Size - 32-bit size of the payload
-
-
-Depending on the request type, payload can be:
-
- * A single 64-bit integer
-   -------
-   | u64 |
-   -------
-
-   u64: a 64-bit unsigned integer
-
- * A vring state description
-   ---------------
-  | index | num |
-  ---------------
-
-   Index: a 32-bit index
-   Num: a 32-bit number
-
- * A vring address description
-   --------------------------------------------------------------
-   | index | flags | size | descriptor | used | available | log |
-   --------------------------------------------------------------
-
-   Index: a 32-bit vring index
-   Flags: a 32-bit vring flags
-   Descriptor: a 64-bit user address of the vring descriptor table
-   Used: a 64-bit user address of the vring used ring
-   Available: a 64-bit user address of the vring available ring
-   Log: a 64-bit guest address for logging
-
- * Memory regions description
-   ---------------------------------------------------
-   | num regions | padding | region0 | ... | region7 |
-   ---------------------------------------------------
-
-   Num regions: a 32-bit number of regions
-   Padding: 32-bit
-
-   A region is:
-   -----------------------------------------------------
-   | guest address | size | user address | mmap offset |
-   -----------------------------------------------------
-
-   Guest address: a 64-bit guest address of the region
-   Size: a 64-bit size
-   User address: a 64-bit user address
-   mmap offset: 64-bit offset where region starts in the mapped memory
-
-* Log description
-   ---------------------------
-   | log size | log offset |
-   ---------------------------
-   log size: size of area used for logging
-   log offset: offset from start of supplied file descriptor
-       where logging starts (i.e. where guest address 0 would be logged)
-
- * An IOTLB message
-   ---------------------------------------------------------
-   | iova | size | user address | permissions flags | type |
-   ---------------------------------------------------------
-
-   IOVA: a 64-bit I/O virtual address programmed by the guest
-   Size: a 64-bit size
-   User address: a 64-bit user address
-   Permissions: a 8-bit value:
-    - 0: No access
-    - 1: Read access
-    - 2: Write access
-    - 3: Read/Write access
-   Type: a 8-bit IOTLB message type:
-    - 1: IOTLB miss
-    - 2: IOTLB update
-    - 3: IOTLB invalidate
-    - 4: IOTLB access fail
-
-In QEMU the vhost-user message is implemented with the following struct:
-
-typedef struct VhostUserMsg {
-    VhostUserRequest request;
-    uint32_t flags;
-    uint32_t size;
-    union {
-        uint64_t u64;
-        struct vhost_vring_state state;
-        struct vhost_vring_addr addr;
-        VhostUserMemory memory;
-        VhostUserLog log;
-        struct vhost_iotlb_msg iotlb;
-    };
-} QEMU_PACKED VhostUserMsg;
-
-Communication
--------------
-
-The protocol for vhost-user is based on the existing implementation of vhost
-for the Linux Kernel. Most messages that can be sent via the Unix domain socket
-implementing vhost-user have an equivalent ioctl to the kernel implementation.
-
-The communication consists of master sending message requests and slave sending
-message replies. Most of the requests don't require replies. Here is a list of
-the ones that do:
-
- * VHOST_USER_GET_FEATURES
- * VHOST_USER_GET_PROTOCOL_FEATURES
- * VHOST_USER_GET_VRING_BASE
- * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
-
-[ Also see the section on REPLY_ACK protocol extension. ]
-
-There are several messages that the master sends with file descriptors passed
-in the ancillary data:
-
- * VHOST_USER_SET_MEM_TABLE
- * VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
- * VHOST_USER_SET_LOG_FD
- * VHOST_USER_SET_VRING_KICK
- * VHOST_USER_SET_VRING_CALL
- * VHOST_USER_SET_VRING_ERR
- * VHOST_USER_SET_SLAVE_REQ_FD
-
-If Master is unable to send the full message or receives a wrong reply it will
-close the connection. An optional reconnection mechanism can be implemented.
-
-Any protocol extensions are gated by protocol feature bits,
-which allows full backwards compatibility on both master
-and slave.
-As older slaves don't support negotiating protocol features,
-a feature bit was dedicated for this purpose:
-#define VHOST_USER_F_PROTOCOL_FEATURES 30
-
-Starting and stopping rings
-----------------------
-Client must only process each ring when it is started.
-
-Client must only pass data between the ring and the
-backend, when the ring is enabled.
-
-If ring is started but disabled, client must process the
-ring without talking to the backend.
-
-For example, for a networking device, in the disabled state
-client must not supply any new RX packets, but must process
-and discard any TX packets.
-
-If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized
-in an enabled state.
-
-If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized
-in a disabled state. Client must not pass data to/from the backend until ring is enabled by
-VHOST_USER_SET_VRING_ENABLE with parameter 1, or after it has been disabled by
-VHOST_USER_SET_VRING_ENABLE with parameter 0.
-
-Each ring is initialized in a stopped state, client must not process it until
-ring is started, or after it has been stopped.
-
-Client must start ring upon receiving a kick (that is, detecting that file
-descriptor is readable) on the descriptor specified by
-VHOST_USER_SET_VRING_KICK, and stop ring upon receiving
-VHOST_USER_GET_VRING_BASE.
-
-While processing the rings (whether they are enabled or not), client must
-support changing some configuration aspects on the fly.
-
-Multiple queue support
-----------------------
-
-Multiple queue is treated as a protocol extension, hence the slave has to
-implement protocol features first. The multiple queues feature is supported
-only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set.
-
-The max number of queues the slave supports can be queried with message
-VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of
-requested queues is bigger than that.
-
-As all queues share one connection, the master uses a unique index for each
-queue in the sent message to identify a specified queue. One queue pair
-is enabled initially. More queues are enabled dynamically, by sending
-message VHOST_USER_SET_VRING_ENABLE.
-
-Migration
----------
-
-During live migration, the master may need to track the modifications
-the slave makes to the memory mapped regions. The client should mark
-the dirty pages in a log. Once it complies to this logging, it may
-declare the VHOST_F_LOG_ALL vhost feature.
-
-To start/stop logging of data/used ring writes, server may send messages
-VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with
-VHOST_VRING_F_LOG in ring's flags set to 1/0, respectively.
-
-All the modifications to memory pointed by vring "descriptor" should
-be marked. Modifications to "used" vring should be marked if
-VHOST_VRING_F_LOG is part of ring's flags.
-
-Dirty pages are of size:
-#define VHOST_LOG_PAGE 0x1000
-
-The log memory fd is provided in the ancillary data of
-VHOST_USER_SET_LOG_BASE message when the slave has
-VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature.
-
-The size of the log is supplied as part of VhostUserMsg
-which should be large enough to cover all known guest
-addresses. Log starts at the supplied offset in the
-supplied file descriptor.
-The log covers from address 0 to the maximum of guest
-regions. In pseudo-code, to mark page at "addr" as dirty:
-
-page = addr / VHOST_LOG_PAGE
-log[page / 8] |= 1 << page % 8
-
-Where addr is the guest physical address.
-
-Use atomic operations, as the log may be concurrently manipulated.
-
-Note that when logging modifications to the used ring (when VHOST_VRING_F_LOG
-is set for this ring), log_guest_addr should be used to calculate the log
-offset: the write to first byte of the used ring is logged at this offset from
-log start. Also note that this value might be outside the legal guest physical
-address range (i.e. does not have to be covered by the VhostUserMemory table),
-but the bit offset of the last byte of the ring must fall within
-the size supplied by VhostUserLog.
-
-VHOST_USER_SET_LOG_FD is an optional message with an eventfd in
-ancillary data, it may be used to inform the master that the log has
-been modified.
-
-Once the source has finished migration, rings will be stopped by
-the source. No further update must be done before rings are
-restarted.
-
-IOMMU support
--------------
-
-When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master
-sends IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
-requests to the slave with a struct vhost_iotlb_msg as payload. For update
-events, the iotlb payload has to be filled with the update message type (2),
-the I/O virtual address, the size, the user virtual address, and the
-permissions flags. Addresses and size must be within vhost memory regions set
-via the VHOST_USER_SET_MEM_TABLE request. For invalidation events, the iotlb
-payload has to be filled with the invalidation message type (3), the I/O virtual
-address and the size. On success, the slave is expected to reply with a zero
-payload, non-zero otherwise.
-
-The slave relies on the slave communcation channel (see "Slave communication"
-section below) to send IOTLB miss and access failure events, by sending
-VHOST_USER_SLAVE_IOTLB_MSG requests to the master with a struct vhost_iotlb_msg
-as payload. For miss events, the iotlb payload has to be filled with the miss
-message type (1), the I/O virtual address and the permissions flags. For access
-failure event, the iotlb payload has to be filled with the access failure
-message type (4), the I/O virtual address and the permissions flags.
-For synchronization purpose, the slave may rely on the reply-ack feature,
-so the master may send a reply when operation is completed if the reply-ack
-feature is negotiated and slaves requests a reply. For miss events, completed
-operation means either master sent an update message containing the IOTLB entry
-containing requested address and permission, or master sent nothing if the IOTLB
-miss message is invalid (invalid IOVA or permission).
-
-The master isn't expected to take the initiative to send IOTLB update messages,
-as the slave sends IOTLB miss messages for the guest virtual memory areas it
-needs to access.
-
-Slave communication
--------------------
-
-An optional communication channel is provided if the slave declares
-VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make
-requests to the master.
-
-The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
-
-A slave may then send VHOST_USER_SLAVE_* messages to the master
-using this fd communication channel.
-
-Protocol features
------------------
-
-#define VHOST_USER_PROTOCOL_F_MQ             0
-#define VHOST_USER_PROTOCOL_F_LOG_SHMFD      1
-#define VHOST_USER_PROTOCOL_F_RARP           2
-#define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
-#define VHOST_USER_PROTOCOL_F_MTU            4
-#define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
-
-Master message types
---------------------
-
- * VHOST_USER_GET_FEATURES
-
-      Id: 1
-      Equivalent ioctl: VHOST_GET_FEATURES
-      Master payload: N/A
-      Slave payload: u64
-
-      Get from the underlying vhost implementation the features bitmask.
-      Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
-      VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
-
- * VHOST_USER_SET_FEATURES
-
-      Id: 2
-      Ioctl: VHOST_SET_FEATURES
-      Master payload: u64
-
-      Enable features in the underlying vhost implementation using a bitmask.
-      Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
-      VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
-
- * VHOST_USER_GET_PROTOCOL_FEATURES
-
-      Id: 15
-      Equivalent ioctl: VHOST_GET_FEATURES
-      Master payload: N/A
-      Slave payload: u64
-
-      Get the protocol feature bitmask from the underlying vhost implementation.
-      Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
-      VHOST_USER_GET_FEATURES.
-      Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
-      this message even before VHOST_USER_SET_FEATURES was called.
-
- * VHOST_USER_SET_PROTOCOL_FEATURES
-
-      Id: 16
-      Ioctl: VHOST_SET_FEATURES
-      Master payload: u64
-
-      Enable protocol features in the underlying vhost implementation.
-      Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
-      VHOST_USER_GET_FEATURES.
-      Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
-      this message even before VHOST_USER_SET_FEATURES was called.
-
- * VHOST_USER_SET_OWNER
-
-      Id: 3
-      Equivalent ioctl: VHOST_SET_OWNER
-      Master payload: N/A
-
-      Issued when a new connection is established. It sets the current Master
-      as an owner of the session. This can be used on the Slave as a
-      "session start" flag.
-
- * VHOST_USER_RESET_OWNER
-
-      Id: 4
-      Master payload: N/A
-
-      This is no longer used. Used to be sent to request disabling
-      all rings, but some clients interpreted it to also discard
-      connection state (this interpretation would lead to bugs).
-      It is recommended that clients either ignore this message,
-      or use it to disable all rings.
-
- * VHOST_USER_SET_MEM_TABLE
-
-      Id: 5
-      Equivalent ioctl: VHOST_SET_MEM_TABLE
-      Master payload: memory regions description
-
-      Sets the memory map regions on the slave so it can translate the vring
-      addresses. In the ancillary data there is an array of file descriptors
-      for each memory mapped region. The size and ordering of the fds matches
-      the number and ordering of memory regions.
-
- * VHOST_USER_SET_LOG_BASE
-
-      Id: 6
-      Equivalent ioctl: VHOST_SET_LOG_BASE
-      Master payload: u64
-      Slave payload: N/A
-
-      Sets logging shared memory space.
-      When slave has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol
-      feature, the log memory fd is provided in the ancillary data of
-      VHOST_USER_SET_LOG_BASE message, the size and offset of shared
-      memory area provided in the message.
-
-
- * VHOST_USER_SET_LOG_FD
-
-      Id: 7
-      Equivalent ioctl: VHOST_SET_LOG_FD
-      Master payload: N/A
-
-      Sets the logging file descriptor, which is passed as ancillary data.
-
- * VHOST_USER_SET_VRING_NUM
-
-      Id: 8
-      Equivalent ioctl: VHOST_SET_VRING_NUM
-      Master payload: vring state description
-
-      Set the size of the queue.
-
- * VHOST_USER_SET_VRING_ADDR
-
-      Id: 9
-      Equivalent ioctl: VHOST_SET_VRING_ADDR
-      Master payload: vring address description
-      Slave payload: N/A
-
-      Sets the addresses of the different aspects of the vring.
-
- * VHOST_USER_SET_VRING_BASE
-
-      Id: 10
-      Equivalent ioctl: VHOST_SET_VRING_BASE
-      Master payload: vring state description
-
-      Sets the base offset in the available vring.
-
- * VHOST_USER_GET_VRING_BASE
-
-      Id: 11
-      Equivalent ioctl: VHOST_USER_GET_VRING_BASE
-      Master payload: vring state description
-      Slave payload: vring state description
-
-      Get the available vring base offset.
-
- * VHOST_USER_SET_VRING_KICK
-
-      Id: 12
-      Equivalent ioctl: VHOST_SET_VRING_KICK
-      Master payload: u64
-
-      Set the event file descriptor for adding buffers to the vring. It
-      is passed in the ancillary data.
-      Bits (0-7) of the payload contain the vring index. Bit 8 is the
-      invalid FD flag. This flag is set when there is no file descriptor
-      in the ancillary data. This signals that polling should be used
-      instead of waiting for a kick.
-
- * VHOST_USER_SET_VRING_CALL
-
-      Id: 13
-      Equivalent ioctl: VHOST_SET_VRING_CALL
-      Master payload: u64
-
-      Set the event file descriptor to signal when buffers are used. It
-      is passed in the ancillary data.
-      Bits (0-7) of the payload contain the vring index. Bit 8 is the
-      invalid FD flag. This flag is set when there is no file descriptor
-      in the ancillary data. This signals that polling will be used
-      instead of waiting for the call.
-
- * VHOST_USER_SET_VRING_ERR
-
-      Id: 14
-      Equivalent ioctl: VHOST_SET_VRING_ERR
-      Master payload: u64
-
-      Set the event file descriptor to signal when error occurs. It
-      is passed in the ancillary data.
-      Bits (0-7) of the payload contain the vring index. Bit 8 is the
-      invalid FD flag. This flag is set when there is no file descriptor
-      in the ancillary data.
-
- * VHOST_USER_GET_QUEUE_NUM
-
-      Id: 17
-      Equivalent ioctl: N/A
-      Master payload: N/A
-      Slave payload: u64
-
-      Query how many queues the backend supports. This request should be
-      sent only when VHOST_USER_PROTOCOL_F_MQ is set in queried protocol
-      features by VHOST_USER_GET_PROTOCOL_FEATURES.
-
- * VHOST_USER_SET_VRING_ENABLE
-
-      Id: 18
-      Equivalent ioctl: N/A
-      Master payload: vring state description
-
-      Signal slave to enable or disable corresponding vring.
-      This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
-      has been negotiated.
-
- * VHOST_USER_SEND_RARP
-
-      Id: 19
-      Equivalent ioctl: N/A
-      Master payload: u64
-
-      Ask vhost user backend to broadcast a fake RARP to notify the migration
-      is terminated for guest that does not support GUEST_ANNOUNCE.
-      Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
-      VHOST_USER_GET_FEATURES and protocol feature bit VHOST_USER_PROTOCOL_F_RARP
-      is present in VHOST_USER_GET_PROTOCOL_FEATURES.
-      The first 6 bytes of the payload contain the mac address of the guest to
-      allow the vhost user backend to construct and broadcast the fake RARP.
-
- * VHOST_USER_NET_SET_MTU
-
-      Id: 20
-      Equivalent ioctl: N/A
-      Master payload: u64
-
-      Set host MTU value exposed to the guest.
-      This request should be sent only when VIRTIO_NET_F_MTU feature has been
-      successfully negotiated, VHOST_USER_F_PROTOCOL_FEATURES is present in
-      VHOST_USER_GET_FEATURES and protocol feature bit
-      VHOST_USER_PROTOCOL_F_NET_MTU is present in
-      VHOST_USER_GET_PROTOCOL_FEATURES.
-      If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
-      with zero in case the specified MTU is valid, or non-zero otherwise.
-
- * VHOST_USER_SET_SLAVE_REQ_FD
-
-      Id: 21
-      Equivalent ioctl: N/A
-      Master payload: N/A
-
-      Set the socket file descriptor for slave initiated requests. It is passed
-      in the ancillary data.
-      This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
-      has been negotiated, and protocol feature bit VHOST_USER_PROTOCOL_F_SLAVE_REQ
-      bit is present in VHOST_USER_GET_PROTOCOL_FEATURES.
-      If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
-      with zero for success, non-zero otherwise.
-
- * VHOST_USER_IOTLB_MSG
-
-      Id: 22
-      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
-      Master payload: struct vhost_iotlb_msg
-      Slave payload: u64
-
-      Send IOTLB messages with struct vhost_iotlb_msg as payload.
-      Master sends such requests to update and invalidate entries in the device
-      IOTLB. The slave has to acknowledge the request with sending zero as u64
-      payload for success, non-zero otherwise.
-      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
-      has been successfully negotiated.
-
-Slave message types
--------------------
-
- * VHOST_USER_SLAVE_IOTLB_MSG
-
-      Id: 1
-      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
-      Slave payload: struct vhost_iotlb_msg
-      Master payload: N/A
-
-      Send IOTLB messages with struct vhost_iotlb_msg as payload.
-      Slave sends such requests to notify of an IOTLB miss, or an IOTLB
-      access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
-      and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
-      zero when operation is successfully completed, or non-zero otherwise.
-      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
-      has been successfully negotiated.
-
-VHOST_USER_PROTOCOL_F_REPLY_ACK:
--------------------------------
-The original vhost-user specification only demands replies for certain
-commands. This differs from the vhost protocol implementation where commands
-are sent over an ioctl() call and block until the client has completed.
-
-With this protocol extension negotiated, the sender (QEMU) can set the
-"need_reply" [Bit 3] flag to any command. This indicates that
-the client MUST respond with a Payload VhostUserMsg indicating success or
-failure. The payload should be set to zero on success or non-zero on failure,
-unless the message already has an explicit reply body.
-
-The response payload gives QEMU a deterministic indication of the result
-of the command. Today, QEMU is expected to terminate the main vhost-user
-loop upon receiving such errors. In future, qemu could be taught to be more
-resilient for selective requests.
-
-For the message types that already solicit a reply from the client, the
-presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
-no behavioural change. (See the 'Communication' section for details.)