diff options
| author | Peter Maydell <peter.maydell@linaro.org> | 2019-05-10 14:49:36 +0100 |
|---|---|---|
| committer | Peter Maydell <peter.maydell@linaro.org> | 2019-05-10 14:49:36 +0100 |
| commit | efb4f3b62c69383a7308d7b739a3193e7c0ccae8 (patch) | |
| tree | d39006b859bec039e9c08f44c9191cab4f053fa5 /docs | |
| parent | 5f022626c8807e5848af223908d533973a4b2922 (diff) | |
| parent | e84125761f78919fe63616d9888ea45e72dc956f (diff) | |
| download | focaccia-qemu-efb4f3b62c69383a7308d7b739a3193e7c0ccae8.tar.gz focaccia-qemu-efb4f3b62c69383a7308d7b739a3193e7c0ccae8.zip | |
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request # gpg: Signature made Fri 10 May 2019 14:02:22 BST # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: docs: add Security chapter to the documentation docs: add Secure Coding Practices to developer docs aio-posix: ensure poll mode is left when aio_notify is called block/io.c: fix for the allocation failure block: Add coroutine_fn to bdrv_check_co_entry util/readline: Add braces to fix checkpatch errors util: readline: replace tab indent by four spaces to fix checkpatch errors util/readline: add a space to fix errors by checkpatch tool Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/devel/index.rst | 1 | ||||
| -rw-r--r-- | docs/devel/secure-coding-practices.rst | 106 | ||||
| -rw-r--r-- | docs/security.texi | 131 |
3 files changed, 238 insertions, 0 deletions
diff --git a/docs/devel/index.rst b/docs/devel/index.rst index ebbab636ce..2a4ddf40ad 100644 --- a/docs/devel/index.rst +++ b/docs/devel/index.rst @@ -20,3 +20,4 @@ Contents: stable-process testing decodetree + secure-coding-practices diff --git a/docs/devel/secure-coding-practices.rst b/docs/devel/secure-coding-practices.rst new file mode 100644 index 0000000000..cbfc8af67e --- /dev/null +++ b/docs/devel/secure-coding-practices.rst @@ -0,0 +1,106 @@ +======================= +Secure Coding Practices +======================= +This document covers topics that both developers and security researchers must +be aware of so that they can develop safe code and audit existing code +properly. + +Reporting Security Bugs +----------------------- +For details on how to report security bugs or ask questions about potential +security bugs, see the `Security Process wiki page +<https://wiki.qemu.org/SecurityProcess>`_. + +General Secure C Coding Practices +--------------------------------- +Most CVEs (security bugs) reported against QEMU are not specific to +virtualization or emulation. They are simply C programming bugs. Therefore +it's critical to be aware of common classes of security bugs. + +There is a wide selection of resources available covering secure C coding. For +example, the `CERT C Coding Standard +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_ +covers the most important classes of security bugs. + +Instead of describing them in detail here, only the names of the most important +classes of security bugs are mentioned: + +* Buffer overflows +* Use-after-free and double-free +* Integer overflows +* Format string vulnerabilities + +Some of these classes of bugs can be detected by analyzers. Static analysis is +performed regularly by Coverity and the most obvious of these bugs are even +reported by compilers. Dynamic analysis is possible with valgrind, tsan, and +asan. + +Input Validation +---------------- +Inputs from the guest or external sources (e.g. network, files) cannot be +trusted and may be invalid. Inputs must be checked before using them in a way +that could crash the program, expose host memory to the guest, or otherwise be +exploitable by an attacker. + +The most sensitive attack surface is device emulation. All hardware register +accesses and data read from guest memory must be validated. A typical example +is a device that contains multiple units that are selectable by the guest via +an index register:: + + typedef struct { + ProcessingUnit unit[2]; + ... + } MyDeviceState; + + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val) + { + MyDeviceState *mydev = opaque; + ProcessingUnit *unit; + + switch (addr) { + case MYDEV_SELECT_UNIT: + unit = &mydev->unit[val]; <-- this input wasn't validated! + ... + } + } + +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take +place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or +1 and handle the case where it is invalid. + +Unexpected Device Accesses +-------------------------- +The guest may access device registers in unusual orders or at unexpected +moments. Device emulation code must not assume that the guest follows the +typical "theory of operation" presented in driver writer manuals. The guest +may make nonsense accesses to device registers such as starting operations +before the device has been fully initialized. + +A related issue is that device emulation code must be prepared for unexpected +device register accesses while asynchronous operations are in progress. A +well-behaved guest might wait for a completion interrupt before accessing +certain device registers. Device emulation code must handle the case where the +guest overwrites registers or submits further requests before an ongoing +request completes. Unexpected accesses must not cause memory corruption or +leaks in QEMU. + +Invalid device register accesses can be reported with +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line +option enables these log messages. + +Live Migration +-------------- +Device state can be saved to disk image files and shared with other users. +Live migration code must validate inputs when loading device state so an +attacker cannot gain control by crafting invalid device states. Device state +is therefore considered untrusted even though it is typically generated by QEMU +itself. + +Guest Memory Access Races +------------------------- +Guests with multiple vCPUs may modify guest RAM while device emulation code is +running. Device emulation code must copy in descriptors and other guest RAM +structures and only process the local copy. This prevents +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to +crash when a vCPU thread modifies guest RAM while device emulation is +processing it. diff --git a/docs/security.texi b/docs/security.texi new file mode 100644 index 0000000000..927764f1e6 --- /dev/null +++ b/docs/security.texi @@ -0,0 +1,131 @@ +@node Security +@chapter Security + +@section Overview + +This chapter explains the security requirements that QEMU is designed to meet +and principles for securely deploying QEMU. + +@section Security Requirements + +QEMU supports many different use cases, some of which have stricter security +requirements than others. The community has agreed on the overall security +requirements that users may depend on. These requirements define what is +considered supported from a security perspective. + +@subsection Virtualization Use Case + +The virtualization use case covers cloud and virtual private server (VPS) +hosting, as well as traditional data center and desktop virtualization. These +use cases rely on hardware virtualization extensions to execute guest code +safely on the physical CPU at close-to-native speed. + +The following entities are untrusted, meaning that they may be buggy or +malicious: + +@itemize +@item Guest +@item User-facing interfaces (e.g. VNC, SPICE, WebSocket) +@item Network protocols (e.g. NBD, live migration) +@item User-supplied files (e.g. disk images, kernels, device trees) +@item Passthrough devices (e.g. PCI, USB) +@end itemize + +Bugs affecting these entities are evaluated on whether they can cause damage in +real-world use cases and treated as security bugs if this is the case. + +@subsection Non-virtualization Use Case + +The non-virtualization use case covers emulation using the Tiny Code Generator +(TCG). In principle the TCG and device emulation code used in conjunction with +the non-virtualization use case should meet the same security requirements as +the virtualization use case. However, for historical reasons much of the +non-virtualization use case code was not written with these security +requirements in mind. + +Bugs affecting the non-virtualization use case are not considered security +bugs at this time. Users with non-virtualization use cases must not rely on +QEMU to provide guest isolation or any security guarantees. + +@section Architecture + +This section describes the design principles that ensure the security +requirements are met. + +@subsection Guest Isolation + +Guest isolation is the confinement of guest code to the virtual machine. When +guest code gains control of execution on the host this is called escaping the +virtual machine. Isolation also includes resource limits such as throttling of +CPU, memory, disk, or network. Guests must be unable to exceed their resource +limits. + +QEMU presents an attack surface to the guest in the form of emulated devices. +The guest must not be able to gain control of QEMU. Bugs in emulated devices +could allow malicious guests to gain code execution in QEMU. At this point the +guest has escaped the virtual machine and is able to act in the context of the +QEMU process on the host. + +Guests often interact with other guests and share resources with them. A +malicious guest must not gain control of other guests or access their data. +Disk image files and network traffic must be protected from other guests unless +explicitly shared between them by the user. + +@subsection Principle of Least Privilege + +The principle of least privilege states that each component only has access to +the privileges necessary for its function. In the case of QEMU this means that +each process only has access to resources belonging to the guest. + +The QEMU process should not have access to any resources that are inaccessible +to the guest. This way the guest does not gain anything by escaping into the +QEMU process since it already has access to those same resources from within +the guest. + +Following the principle of least privilege immediately fulfills guest isolation +requirements. For example, guest A only has access to its own disk image file +@code{a.img} and not guest B's disk image file @code{b.img}. + +In reality certain resources are inaccessible to the guest but must be +available to QEMU to perform its function. For example, host system calls are +necessary for QEMU but are not exposed to guests. A guest that escapes into +the QEMU process can then begin invoking host system calls. + +New features must be designed to follow the principle of least privilege. +Should this not be possible for technical reasons, the security risk must be +clearly documented so users are aware of the trade-off of enabling the feature. + +@subsection Isolation mechanisms + +Several isolation mechanisms are available to realize this architecture of +guest isolation and the principle of least privilege. With the exception of +Linux seccomp, these mechanisms are all deployed by management tools that +launch QEMU, such as libvirt. They are also platform-specific so they are only +described briefly for Linux here. + +The fundamental isolation mechanism is that QEMU processes must run as +unprivileged users. Sometimes it seems more convenient to launch QEMU as +root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a +huge security risk. File descriptor passing can be used to give an otherwise +unprivileged QEMU process access to host devices without running QEMU as root. +It is also possible to launch QEMU as a non-root user and configure UNIX groups +for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes. +Some Linux distros already ship with UNIX groups for these devices by default. + +@itemize +@item SELinux and AppArmor make it possible to confine processes beyond the +traditional UNIX process and file permissions model. They restrict the QEMU +process from accessing processes and files on the host system that are not +needed by QEMU. + +@item Resource limits and cgroup controllers provide throughput and utilization +limits on key resources such as CPU time, memory, and I/O bandwidth. + +@item Linux namespaces can be used to make process, file system, and other system +resources unavailable to QEMU. A namespaced QEMU process is restricted to only +those resources that were granted to it. + +@item Linux seccomp is available via the QEMU @option{--sandbox} option. It disables +system calls that are not needed by QEMU, thereby reducing the host kernel +attack surface. +@end itemize |