results/classifier/gemma3:12b/hypervisor/2882


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

Reading ACPI info via fw_cfg in SVSM causes Linux to panic
Description of problem:
We could use some help from a Qemu expert with an Qemu/ACPI/Linux related problem,
working on Coconut SVSM.

**See https://github.com/coconut-svsm/svsm/issues/646**

Coconut has code to read ACPI information via fw_cfg, and extract the number of guest CPUs from that.
That code has been used in the past, but since igvm became the default launch method for SVSM, it
was only used in corner-cases, while the information were obtained in some other way (igmv parameter).
Now a commit switches back to the fw_cfg+ACPI method. The information returned by it is correct, but
strangely Linux (guest) is spitting out ACPI related errors and crashes in some cases before reaching user-space. We do not have any clue how this can be related other than through Qemu behavior. 
There is no direct way how SVSM can influence the ACPI related behavior of the Linux
guest kernel.

The problem seems to be caused by simply reading the ACPI data.

Reverting the bad commit and simply calling the original fw_cfg acpi function causes problems for Linux.
Steps to reproduce:
Boot SVSM bases CVM. SVSM and OVMF boot OK, then Linux prints these errors in some scenarios panics:
```
[...]
[    1.857709] ACPI: Added _OSI(Processor Aggregator Device)
[    1.859436] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    1.860867] ACPI Error: AE_BAD_ADDRESS, Unable to initialize fixed events (20240827/evevent-53)
[    1.862709] ACPI: Unable to start the ACPI Interpreter
[    1.863708] ACPI Error: Could not remove SCI handler (20240827/evmisc-251)
[    1.864942] ACPI Error: AE_BAD_PARAMETER, Thread 2176690624 could not acquire Mutex [ACPI_MT
X_Namespace] (0x1) (20240827/utmutex-252)
[    1.866715] ACPI Error: AE_BAD_PARAMETER, Thread 2176690624 could not acquire Mutex [ACPI_MTX_Tables] (0x2) (20240827/utmutex-252)
[    1.869722] ACPI Error: Mutex [ACPI_MTX_Tables] (0x2) is not acquired, cannot release (20240
827/utmutex-289)
[    1.870826] iommu: Default domain type: Translated
[    1.871710] iommu: DMA domain TLB invalidation policy: lazy mod
[...]
[    2.672635] io scheduler bfq registered
[    2.675679] atomic64_test: passed for x86-64 platform with CX8 and with SSE
[    2.677596] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    2.679264] ------------[ cut here ]------------
[    2.680284] refcount_t: addition on 0; use-after-free.
[    2.681477] WARNING: CPU: 3 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110
[    2.683261] Modules linked in:
[    2.683929] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.6-200.fc41.x86_64 #1
[    2.685608] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-stable202502-39-gb
483f751 02/02/2022
[    2.687729] RIP: 0010:refcount_warn_saturate+0xe5/0x110
[    2.688853] Code: e3 7f ff 0f 0b e9 fb 0a 8a 00 80 3d 15 9f 23 02 00 0f 85 5e ff ff ff 48 c7 c7 30 7b e7 8c c6 05 01 9f 23 02 01 e8 fb e2 7f ff <0f> 0b e9 d4 0a 8a 00 48 c7 c7 88 7b e7 8c
 c6 05 e5 9e 23 02 01 e8
[    2.692768] RSP: 0018:ffffb2ed0001fd90 EFLAGS: 00010282
[    2.693894] RAX: 0000000000000000 RBX: ffff975b81060a80 RCX: ffffffff8d967448
[    2.695410] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
[    2.696923] RBP: ffffb2ed0001fe38 R08: 0000000000000000 R09: 0720072007200720
[    2.698439] R10: 0720072007200720 R11: 0720072007200720 R12: ffff975b81060a80
[    2.699955] R13: ffffb2ed0001fe78 R14: 00000000000000dc R15: 00000000000001df
[    2.701461] FS:  0000000000000000(0000) GS:ffff975cf7d80000(0000) knlGS:0000000000000000
[    2.703179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.704400] CR2: 00007f1c1658a3c8 CR3: 000800006082c000 CR4: 00000000003506f0
[    2.705910] Call Trace:
[    2.706451]  <TASK>
[    2.706922]  ? srso_return_thunk+0x5/0x5f
[    2.707820]  ? show_trace_log_lvl+0x255/0x2f0
[    2.708783]  ? show_trace_log_lvl+0x255/0x2f0
[    2.709712]  ? kobject_get+0x68/0x70
[    2.710492]  ? refcount_warn_saturate+0xe5/0x110
[    2.711480]  ? __warn.cold+0x93/0xfa
[    2.712268]  ? refcount_warn_saturate+0xe5/0x110
[    2.713262]  ? report_bug+0xff/0x140
[    2.714036]  ? handle_bug+0x58/0x90
[    2.714779]  ? exc_invalid_op+0x17/0x70
[    2.715617]  ? asm_exc_invalid_op+0x1a/0x20
[    2.716526]  ? refcount_warn_saturate+0xe5/0x110
[    2.717507]  kobject_get+0x68/0x70
[    2.718266]  kobject_add_internal+0x32/0x250
[    2.719196]  kobject_add+0x96/0xc0
[    2.719923]  kobject_create_and_add+0xa3/0xc0
[    2.720851]  bgrt_init+0x77/0xc0
[    2.721578]  ? __pfx_bgrt_init+0x10/0x10
[    2.722418]  do_one_initcall+0x5b/0x310
[    2.723272]  do_initcalls+0x147/0x170
[    2.724086]  ? __pfx_kernel_init+0x10/0x10
[    2.725174]  kernel_init_freeable+0xfb/0x130
[    2.726114]  kernel_init+0x1a/0x140
[    2.726883]  ret_from_fork+0x34/0x50
[    2.727679]  ? __pfx_kernel_init+0x10/0x10
[    2.728580]  ret_from_fork_asm+0x1a/0x30
[    2.729429]  </TASK>
[    2.729926] ---[ end trace 0000000000000000 ]---
```
Additional information: