1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
|
Use-after-free after flush in TCG accelerator
I believe I found a UAF in TCG that can lead to a guest VM escape. The security
list informed me "This can not be treated as a security issue." and to post it
here. I am looking at the 4.2.0 source code. The issue requires a race and I
will try to describe it in terms of three concurrent threads.
I am looking
at the 4.2.0 source code. The issue requires a race and I will try to describe
it in terms of three concurrent threads.
Thread A:
A1. qemu_tcg_cpu_thread_fn runs work loop
A2. qemu_wait_io_event => qemu_wait_io_event_common => process_queued_cpu_work
A3. start_exclusive critical section entered
A4. do_tb_flush is called, TB memory freed/re-allocated
A5. end_exclusive exits critical section
Thread B:
B1. qemu_tcg_cpu_thread_fn runs work loop
B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
B3. tcg_tb_alloc obtains a new TB
Thread C:
C1. qemu_tcg_cpu_thread_fn runs work loop
C2. cpu_exec_step_atomic executes
C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
C4. start_exclusive critical section entered
C5. cpu_tb_exec executes the TB code
C6. end_exclusive exits critical section
Consider the following sequence of events:
B2 => B3 => C3 (same TB as B2) => A3 => A4 (TB freed) => A5 => B2 =>
B3 (re-allocates TB from B2) => C4 => C5 (freed/reused TB now executing) => C6
In short, because thread C uses the TB in the critical section, there is no
guarantee that the pointer has not been "freed" (rather the memory is marked as
re-usable) and therefore a use-after-free occurs.
Since the TCG generated code can be in the same memory as the TB data structure,
it is possible for an attacker to overwrite the UAF pointer with code generated
from TCG. This can overwrite key pointer values and could lead to code
execution on the host outside of the TCG sandbox.
The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
which is invalidated by a tb_flush before we execute it. This doesn't
affect the other cpu_exec modes as a tb_flush by it's nature can only
occur on a quiescent system. The race was described as:
B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
B3. tcg_tb_alloc obtains a new TB
C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
(same TB as B2)
A3. start_exclusive critical section entered
A4. do_tb_flush is called, TB memory freed/re-allocated
A5. end_exclusive exits critical section
B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
B3. tcg_tb_alloc reallocates TB from B2
C4. start_exclusive critical section entered
C5. cpu_tb_exec executes the TB code that was free in A4
The simplest fix is to widen the exclusive period to include the TB
lookup. As a result we can drop the complication of checking we are in
the exclusive region before we end it.
Signed-off-by: Alex Bennée <email address hidden>
Cc: Yifan <email address hidden>
Cc: Bug 1863025 <email address hidden>
---
accel/tcg/cpu-exec.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2560c90eec7..d95c4848a47 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -240,6 +240,8 @@ void cpu_exec_step_atomic(CPUState *cpu)
uint32_t cf_mask = cflags & CF_HASH_MASK;
if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+ start_exclusive();
+
tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
if (tb == NULL) {
mmap_lock();
@@ -247,8 +249,6 @@ void cpu_exec_step_atomic(CPUState *cpu)
mmap_unlock();
}
- start_exclusive();
-
/* Since we got here, we know that parallel_cpus must be true. */
parallel_cpus = false;
cc->cpu_exec_enter(cpu);
@@ -271,14 +271,15 @@ void cpu_exec_step_atomic(CPUState *cpu)
qemu_plugin_disable_mem_helpers(cpu);
}
- if (cpu_in_exclusive_context(cpu)) {
- /* We might longjump out of either the codegen or the
- * execution, so must make sure we only end the exclusive
- * region if we started it.
- */
- parallel_cpus = true;
- end_exclusive();
- }
+
+ /*
+ * As we start the exclusive region before codegen we must still
+ * be in the region if we longjump out of either the codegen or
+ * the execution.
+ */
+ g_assert(cpu_in_exclusive_context(cpu));
+ parallel_cpus = true;
+ end_exclusive();
}
struct tb_desc {
--
2.20.1
I've attached a variant of the suggested patch which simply expands the exclusive period. It's hard to test extensively as not many things use the EXCP_ATOMIC mechanism. Can I ask how you found the bug and if you can re-test with the suggested patch?
I found it just by launching Ubuntu 19.10 live cd with QXL driver. I will re-test this weekend.
The workaround I had is to check the number of TLB flushes and to re-try obtaining the TB if the number changes. There is a penalty for the case where TLB is flushed but should not degrade performance in most cases. I think obtaining the lock earlier will slow down the VM if EXCP_ATOMIC is used often.
Of course, I am assuming TLB flush is the only way to cause this bug.
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index d1c2b6ea1fd..d83b578299b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -250,8 +250,11 @@ void cpu_exec_step_atomic(CPUState *cpu)
uint32_t flags;
uint32_t cflags = 1;
uint32_t cf_mask = cflags & CF_HASH_MASK;
+ unsigned flush_count;
if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+retry:
+ flush_count = tb_flush_count();
tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
if (tb == NULL) {
mmap_lock();
@@ -260,6 +263,11 @@ void cpu_exec_step_atomic(CPUState *cpu)
}
start_exclusive();
+ /* do_tb_flush() might run and make tb invalid */
+ if (flush_count != tb_flush_count()) {
+ end_exclusive();
+ goto retry;
+ }
/* Since we got here, we know that parallel_cpus must be true. */
parallel_cpus = false;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4ed9d0abaf2..ecf7d3b53ff 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2696,6 +2696,11 @@ void tcg_flush_softmmu_tlb(CPUState *cs)
#endif
}
+unsigned tb_flush_count(void)
+{
+ return atomic_read(&tb_ctx.tb_flush_count);
+}
+
#if defined(CONFIG_NO_RWX)
void tb_exec_memory_lock(void)
{
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5ccc9485812..1bc61fa6d76 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -584,6 +584,7 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr);
void tb_flush_jmp_cache(CPUState *cpu, target_ulong addr);
/* translate-all.c */
+unsigned tb_flush_count(void);
#if defined(CONFIG_NO_RWX)
void tb_exec_memory_lock(void);
bool tb_is_exec(const TranslationBlock *tb);
Apologies, the patch got messed up.
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c01f59c743..7a9e8c94bd 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -238,8 +238,11 @@ void cpu_exec_step_atomic(CPUState *cpu)
uint32_t flags;
uint32_t cflags = 1;
uint32_t cf_mask = cflags & CF_HASH_MASK;
+ unsigned flush_count;
if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+retry:
+ flush_count = tb_flush_count();
tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
if (tb == NULL) {
mmap_lock();
@@ -248,6 +251,11 @@ void cpu_exec_step_atomic(CPUState *cpu)
}
start_exclusive();
+ /* do_tb_flush() might run and make tb invalid */
+ if (flush_count != tb_flush_count()) {
+ end_exclusive();
+ goto retry;
+ }
/* Since we got here, we know that parallel_cpus must be true. */
parallel_cpus = false;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 9f48da9472..2fb7da9b51 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2674,3 +2674,8 @@ void tcg_flush_softmmu_tlb(CPUState *cs)
tlb_flush(cs);
#endif
}
+
+unsigned tb_flush_count(void)
+{
+ return atomic_read(&tb_ctx.tb_flush_count);
+}
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d85e610e85..aa3c2d219a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -579,6 +579,9 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr);
/* exec.c */
void tb_flush_jmp_cache(CPUState *cpu, target_ulong addr);
+/* translate-all.c */
+unsigned tb_flush_count(void);
+
MemoryRegionSection *
address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
hwaddr *xlat, hwaddr *plen,
What race are you thinking of in my patch? The obvious race I can
think of is benign:
Case 1:
A: does TB flush
B: read tb_flush_count
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 2:
B: read tb_flush_count
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 3:
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: read tb_flush_count
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (no increment seen)
B: proceeds
Case 1 is the expected case. Case 2, we thought TB was stale but it
wasn't so we get it again with tb_lookup__cpu_state with minimal extra
overhead.
Case 3 seems to be bad because we could read tb_flush_count and find
it already incremented. But if so that means thread A is at the end of
do_tb_flush and the lookup tables are already cleared and the TCG
context is already reset. So it should be safe for thread B to call
tb_lookup__cpu_state or tb_gen_code.
Yifan
On Fri, Feb 14, 2020 at 3:31 PM Richard Henderson
<email address hidden> wrote:
>
> On 2/14/20 6:49 AM, Alex Bennée wrote:
> > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
> > which is invalidated by a tb_flush before we execute it. This doesn't
> > affect the other cpu_exec modes as a tb_flush by it's nature can only
> > occur on a quiescent system. The race was described as:
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc obtains a new TB
> >
> > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
> > (same TB as B2)
> >
> > A3. start_exclusive critical section entered
> > A4. do_tb_flush is called, TB memory freed/re-allocated
> > A5. end_exclusive exits critical section
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc reallocates TB from B2
> >
> > C4. start_exclusive critical section entered
> > C5. cpu_tb_exec executes the TB code that was free in A4
> >
> > The simplest fix is to widen the exclusive period to include the TB
> > lookup. As a result we can drop the complication of checking we are in
> > the exclusive region before we end it.
>
> I'm not 100% keen on having the tb_gen_code within the exclusive region. It
> implies a much larger delay on (at least) the first execution of the atomic
> operation.
>
> But I suppose until recently we had a global lock around code generation, and
> this is only slightly worse. Plus, it has the advantage of being dead simple,
> and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch.
>
> Applied to tcg-next.
>
>
> r~
Fixed here:
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=886cc68943eb
CVE-2020-24165 was assigned to this:
https://nvd.nist.gov/vuln/detail/CVE-2020-24165
I had no involvement in the assignment, posting here for reference only.
On Thu, Aug 31, 2023 at 03:40:25PM +0200, Philippe Mathieu-Daudé wrote:
> Hi Samuel,
>
> On 31/8/23 14:48, Samuel Henrique wrote:
> > CVE-2020-24165 was assigned to this:
> > https://nvd.nist.gov/vuln/detail/CVE-2020-24165
> >
> > I had no involvement in the assignment, posting here for reference only.
> >
> > ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-24165
>
> QEMU 4.2.0 was released in 2019. The issue you report
> has been fixed in commit 886cc68943 ("accel/tcg: fix race
> in cpu_exec_step_atomic (bug 1863025)") which is included
> in QEMU v5.0, released in April 2020, more than 3 years ago.
>
> What do you expect us to do here? I'm not sure whether assigning
> CVE for 3 years old code is a good use of engineering time.
In any case per our stated security policy, we do not consider TCG to
be providing a security boundary between host and guest, and thus bugs
in TCG aren't considered security flaws:
https://www.qemu.org/docs/master/system/security.html#non-virtualization-use-case
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Aug 31, 2023 at 3:57 PM Daniel P. Berrangé <email address hidden> wrote:
>
> On Thu, Aug 31, 2023 at 03:40:25PM +0200, Philippe Mathieu-Daudé wrote:
> > Hi Samuel,
> >
> > On 31/8/23 14:48, Samuel Henrique wrote:
> > > CVE-2020-24165 was assigned to this:
> > > https://nvd.nist.gov/vuln/detail/CVE-2020-24165
> > >
> > > I had no involvement in the assignment, posting here for reference only.
> > >
> > > ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-24165
> >
> > QEMU 4.2.0 was released in 2019. The issue you report
> > has been fixed in commit 886cc68943 ("accel/tcg: fix race
> > in cpu_exec_step_atomic (bug 1863025)") which is included
> > in QEMU v5.0, released in April 2020, more than 3 years ago.
> >
> > What do you expect us to do here? I'm not sure whether assigning
> > CVE for 3 years old code is a good use of engineering time.
>
> In any case per our stated security policy, we do not consider TCG to
> be providing a security boundary between host and guest, and thus bugs
> in TCG aren't considered security flaws:
>
> https://www.qemu.org/docs/master/system/security.html#non-virtualization-use-case
Right, and it is clearly indicated in the referenced launchpad bug:
'The security list informed me "This can not be treated as a security
issue"'.
This adds up to CVE-2022-36648, which is also a mystery to me in terms
of CVE assignment and CVSS scoring (rated as Critical). I don't know
what's going on with NVD, there must be something wrong on their side.
I disputed both CVEs via https://cveform.mitre.org/.
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
--
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0
|