summary refs log tree commit diff stats
path: root/docs/specs/acpi_hest_ghes.rst
blob: aaf7b1ad11a5ff71dc8b881b280c70418950ccc0 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
APEI tables generating and CPER record
======================================

..
   Copyright (c) 2020 HUAWEI TECHNOLOGIES CO., LTD.

   This work is licensed under the terms of the GNU GPL, version 2 or later.
   See the COPYING file in the top-level directory.

Design Details
--------------

::

         etc/acpi/tables                           etc/hardware_errors
      ====================                   ===============================
  + +--------------------------+            +----------------------------+
  | | HEST                     | +--------->|    error_block_address1    |------+
  | +--------------------------+ |          +----------------------------+      |
  | | GHES1                    | | +------->|    error_block_address2    |------+-+
  | +--------------------------+ | |        +----------------------------+      | |
  | | .................        | | |        |      ..............        |      | |
  | | error_status_address-----+-+ |        -----------------------------+      | |
  | | .................        |   |   +--->|    error_block_addressN    |------+-+---+
  | | read_ack_register--------+-+ |   |    +----------------------------+      | |   |
  | | read_ack_preserve        | +-+---+--->|     read_ack_register1     |      | |   |
  | | read_ack_write           |   |   |    +----------------------------+      | |   |
  + +--------------------------+   | +-+--->|     read_ack_register2     |      | |   |
  | | GHES2                    |   | | |    +----------------------------+      | |   |
  + +--------------------------+   | | |    |       .............        |      | |   |
  | | .................        |   | | |    +----------------------------+      | |   |
  | | error_status_address-----+---+ | | +->|     read_ack_registerN     |      | |   |
  | | .................        |     | | |  +----------------------------+      | |   |
  | | read_ack_register--------+-----+ | |  |Generic Error Status Block 1|<-----+ |   |
  | | read_ack_preserve        |       | |  |-+------------------------+-+        |   |
  | | read_ack_write           |       | |  | |          CPER          | |        |   |
  + +--------------------------|       | |  | |          CPER          | |        |   |
  | | ...............          |       | |  | |          ....          | |        |   |
  + +--------------------------+       | |  | |          CPER          | |        |   |
  | | GHESN                    |       | |  |-+------------------------+-|        |   |
  + +--------------------------+       | |  |Generic Error Status Block 2|<-------+   |
  | | .................        |       | |  |-+------------------------+-+            |
  | | error_status_address-----+-------+ |  | |           CPER         | |            |
  | | .................        |         |  | |           CPER         | |            |
  | | read_ack_register--------+---------+  | |           ....         | |            |
  | | read_ack_preserve        |            | |           CPER         | |            |
  | | read_ack_write           |            +-+------------------------+-+            |
  + +--------------------------+            |         ..........         |            |
                                            |----------------------------+            |
                                            |Generic Error Status Block N |<----------+
                                            |-+-------------------------+-+
                                            | |          CPER           | |
                                            | |          CPER           | |
                                            | |          ....           | |
                                            | |          CPER           | |
                                            +-+-------------------------+-+


(1) QEMU generates the ACPI HEST table. This table goes in the current
    "etc/acpi/tables" fw_cfg blob. Each error source has different
    notification types.

(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
    also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
    contains an address registers table and an Error Status Data Block table.

(3) The address registers table contains N Error Block Address entries
    and N Read Ack Register entries. The size for each entry is 8-byte.
    The Error Status Data Block table contains N Error Status Data Block
    entries. The size for each entry is defined at the source code as
    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
    for the "etc/hardware_errors" fw_cfg blob is
    (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
    N is the number of the kinds of hardware error sources.

(4) QEMU generates the ACPI linker/loader script for the firmware. The
    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
    and copies blob contents there.

(5) QEMU generates N ADD_POINTER commands, which patch addresses in the
    "error_status_address" fields of the HEST table with a pointer to the
    corresponding "address registers" in the "etc/hardware_errors" blob.

(6) QEMU generates N ADD_POINTER commands, which patch addresses in the
    "read_ack_register" fields of the HEST table with a pointer to the
    corresponding "read_ack_register" within the "etc/hardware_errors" blob.

(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
    addresses in the "error_block_address" fields with a pointer to the
    respective "Error Status Data Block" in the "etc/hardware_errors" blob.

(8) QEMU defines a third and write-only fw_cfg blob to store the location
    where the error block offsets, read ack registers and CPER records are
    stored.

    Up to QEMU 9.2, the location was at "etc/hardware_errors_addr", and
    contains a GPA for the beginning of "etc/hardware_errors".

    Newer versions place the location at "etc/acpi_table_hest_addr",
    pointing to the GPA of the HEST table.

    Using above mentioned 'fw_cfg' files, the firmware can send back the
    guest-side allocation addresses to QEMU. They contain a 8-byte entry.
    QEMU generates a single WRITE_POINTER command for the firmware. The
    firmware will write back the start address of either "etc/hardware_errors"
    or HEST table at the corresponding fw_cfg file.

(9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into corresponding
    "Error Status Data Block", guest memory, and then injects platform specific
    interrupt (in case of arm/virt machine it's Synchronous External Abort) as a
    notification which is necessary for notifying the guest.

(10) This notification (in virtual hardware) will be handled by the guest
     kernel, on receiving notification, guest APEI driver could read the CPER error
     and take appropriate action.

(11) kvm_arch_on_sigbus_vcpu() reports RAS errors via a SEA notifications,
     when a SIGBUS event is triggered.