summary refs log tree commit diff stats
path: root/llm-chats/deepseek
blob: 674b9d60756919914ff856340d3db764f94f4f44 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
If the tiny code generator of qemu makes an error in the binary translation, when emulating an instruction set architecture, we call it a semantic mistranslation bug. Here are three gitlab issues in the toml format. Tell me, if they describe a semantic mistranslation bug. Answer only with yes and no.

1. issue:
id = 1371
title = "x86 BLSMSK semantic bug"
state = "closed"
created_at = "2022-12-16T06:43:29.794Z"
closed_at = "2023-03-01T01:08:38.844Z"
labels = ["Closed::Fixed", "accel: TCG", "target: i386"]
url = "https://gitlab.com/qemu-project/qemu/-/issues/1371"
host-os = "Windows 10 20H2"
host-arch = "x86"
qemu-version = "7.1.90 (v7.2.0-rc0)"
guest-os = "None"
guest-arch = "x86"
description = """The result of instruction BLSMSK is different with from the CPU. The value of CF is different."""
reproduce = """1. Compile this code
void main() {
    asm("mov rax, 0x65b2e276ad27c67");
    asm("mov rbx, 0x62f34955226b2b5d");
    asm("blsmsk eax, ebx");
}

2. Execute and compare the result with the CPU.
    - CPU
        - CF = 0
    - QEMU
        - CF = 1"""
additional = """This bug is discovered by research conducted by KAIST SoftSec."""

2. issue:
id = 1057
title = "AArch64: ISV is set to 1 in ESR_EL2 when taking a data abort with post-indexed instructions"
state = "closed"
created_at = "2022-06-02T20:50:04.695Z"
closed_at = "2022-07-18T16:47:38.098Z"
labels = ["accel: TCG", "target: arm", "workflow::In Progress"]
url = "https://gitlab.com/qemu-project/qemu/-/issues/1057"
host-os = "MacOS 12.4"
host-arch = "ARM64"
qemu-version = "QEMU emulator version 7.0.0"
guest-os = "Custom - BedRock Hypervisor running with the NOVA microkernel"
guest-arch = "ARMv8"
description = """I think that I have a Qemu bug in my hands, but, I could still be missing something. Consider the following instruction:
0x0000000000000000:  C3 44 00 B8    str   w3, [x6], #4

notice the last #4, I think this is what we would call a post-indexed instruction (falls into the category of instructions with writeback). As I understand it, those instructions should not have ISV=1 in ESR_EL2 when faulting.

Here is the relevant part of the manual:

For other faults reported in ESR_EL2, ISV is 0 except for the following stage 2 aborts:
• AArch64 loads and stores of a single general-purpose register (including the register specified with 0b11111, including those with Acquire/Release semantics, but excluding Load Exclusive or Store Exclusive and excluding those with writeback).


However, I can see that Qemu sets ISV to 1 here. The ARM hardware that I tested gave me a value of ISV=0 for similar instructions.

Another example of instruction: 0x00000000000002f8:  01 1C 40 38    ldrb  w1, [x0, #1]!"""
reproduce = """1. Run some hypervisor in EL2
2. Create a guest running at EL1 that executes one of the mentioned instructions (and make the instruction fault by writing to some unmapped page in SLP)
3. Observe the value of ESR_EL2 on data abort

Unfortunately, I cannot provide an image to reproduce this (the software is not open-source). But, I would be happy to help test a patch."""
additional = "n/a"

-----------------------------------------------------------------------------------------

yes

no

-----------------------------------------------------------------------------------------

Now I will give you Mailing Threads. Transform them into the toml format i provided you, which should be the only output. If a variable is not given in the mails, write "n/a". The id's should be 10000 and upwards.  the state is closed, when the bug is fixed, and open, when the bug is not fixed. Additionally, add a 'mistranslation' variable, which is set to 'yes', if the bug is a mistranslation bug, and 'no', if not. First Thread:

On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote:
> Date: Tue, 29 Apr 2025 13:55:59 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT /
>  CMP_LEG bits
> 
> On 4/29/2025 11:02 AM, Ewan Hai wrote:
> > Hi Community,
> > 
> > This email contains 3 bugs appear to share the same root cause.
> > 
> > [1] We ran into the following warnings when running QEMU v10.0.0 in TCG
> > mode:
> > 
> > qemu-system-x86_64 \
> >    -machine q35 \
> >    -m 4G -smp 4 \
> >    -kernel ./arch/x86/boot/bzImage \
> >    -bios /usr/share/ovmf/OVMF.fd \
> >    -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
> >    -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
> >    -nographic \
> >    -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
> > 
> > qemu-system-x86_64: warning: TCG doesn't support requested feature:
> > CPUID.01H:EDX.ht [bit 28]
> > qemu-system-x86_64: warning: TCG doesn't support requested feature:
> > CPUID.80000001H:ECX.cmp-legacy [bit 1]
> > (repeats 4 times, once per vCPU)
> > 
> > Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
> > CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
> > what introduced the warnings.
> > 
> > Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
> > and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
> > support, these bits trigger the warnings above.
> > 
> > [2] Also, Zhao pointed me to a similar report on GitLab:
> > https://gitlab.com/qemu-project/qemu/-/issues/2894
> > The symptoms there look identical to what we're seeing.
> > 
> > By convention we file one issue per email, but these two appear to share
> > the same root cause, so I'm describing them together here.
> 
> It was caused by my two patches. I think the fix can be as follow.
> If no objection from the community, I can submit the formal patch.
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 1f970aa4daa6..fb95aadd6161 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
> vendor1,
>            CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
>            CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
>            CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
> -          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
> +          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
> +          CPUID_HT)
>            /* partly implemented:
>            CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
>            /* missing:
> -          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
> +          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
> 
>  /*
>   * Kernel-only features that can be shown to usermode programs even if
> @@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
> vendor1,
> 
>  #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
>            CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
> -          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
> +          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
> +          CPUID_EXT3_CMP_LEG)
> 
>  #define TCG_EXT4_FEATURES 0

This fix is fine for me...at least from SDM, HTT depends on topology and
it should exist when user sets "-smp 4".

> > [3] My colleague Alan noticed what appears to be a related problem: if
> > we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
> > explicitly removing the ht flag, but the guest still reports HT(cat
> > /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
> > bit seems to be forced on even when the user tries to disable it.
> 
> XiaoYao reminded me that issue [3] stems from a different patch. Please
> ignore it for now—I'll start a separate thread to discuss that one
> independently.

I haven't found any other thread :-).

By the way, just curious, in what cases do you need to disbale the HT
flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
enough?

As for the “-ht” behavior, I'm also unsure whether this should be fixed
or not - one possible consideration is whether “-ht” would be useful.

On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote:
> Date: Tue, 29 Apr 2025 13:55:59 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT /
>  CMP_LEG bits
> 
> On 4/29/2025 11:02 AM, Ewan Hai wrote:
> > Hi Community,
> > 
> > This email contains 3 bugs appear to share the same root cause.
> > 
> > [1] We ran into the following warnings when running QEMU v10.0.0 in TCG
> > mode:
> > 
> > qemu-system-x86_64 \
> >    -machine q35 \
> >    -m 4G -smp 4 \
> >    -kernel ./arch/x86/boot/bzImage \
> >    -bios /usr/share/ovmf/OVMF.fd \
> >    -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
> >    -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
> >    -nographic \
> >    -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
> > 
> > qemu-system-x86_64: warning: TCG doesn't support requested feature:
> > CPUID.01H:EDX.ht [bit 28]
> > qemu-system-x86_64: warning: TCG doesn't support requested feature:
> > CPUID.80000001H:ECX.cmp-legacy [bit 1]
> > (repeats 4 times, once per vCPU)
> > 
> > Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
> > CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
> > what introduced the warnings.
> > 
> > Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
> > and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
> > support, these bits trigger the warnings above.
> > 
> > [2] Also, Zhao pointed me to a similar report on GitLab:
> > https://gitlab.com/qemu-project/qemu/-/issues/2894
> > The symptoms there look identical to what we're seeing.
> > 
> > By convention we file one issue per email, but these two appear to share
> > the same root cause, so I'm describing them together here.
> 
> It was caused by my two patches. I think the fix can be as follow.
> If no objection from the community, I can submit the formal patch.
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 1f970aa4daa6..fb95aadd6161 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
> vendor1,
>            CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
>            CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
>            CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
> -          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
> +          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
> +          CPUID_HT)
>            /* partly implemented:
>            CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
>            /* missing:
> -          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
> +          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
> 
>  /*
>   * Kernel-only features that can be shown to usermode programs even if
> @@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
> vendor1,
> 
>  #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
>            CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
> -          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
> +          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
> +          CPUID_EXT3_CMP_LEG)
> 
>  #define TCG_EXT4_FEATURES 0

This fix is fine for me...at least from SDM, HTT depends on topology and
it should exist when user sets "-smp 4".

> > [3] My colleague Alan noticed what appears to be a related problem: if
> > we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
> > explicitly removing the ht flag, but the guest still reports HT(cat
> > /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
> > bit seems to be forced on even when the user tries to disable it.
> 
> XiaoYao reminded me that issue [3] stems from a different patch. Please
> ignore it for now—I'll start a separate thread to discuss that one
> independently.

I haven't found any other thread :-).

By the way, just curious, in what cases do you need to disbale the HT
flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
enough?

As for the “-ht” behavior, I'm also unsure whether this should be fixed
or not - one possible consideration is whether “-ht” would be useful.

-----------------------------------------------------------------------------------------

toml
[[issue]]
id = 10000
title = "QEMU TCG warnings after commit c6bd2dd63420 - HTT / CMP_LEG bits"
state = "closed"
created_at = "2025-04-29T11:02:00.000Z"
closed_at = "n/a"
labels = ["accel: TCG", "target: i386"]
url = "n/a"
host-os = "n/a"
host-arch = "x86"
qemu-version = "10.0.0"
guest-os = "Linux"
guest-arch = "x86"
description = """TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and CMP_LEG (CPUID 8000_0001 ECX[1]) despite lacking SMT support, causing warnings."""
reproduce = """Run QEMU in TCG mode with a Linux guest and observe CPUID-related warnings."""
additional = """Related to GitLab issue #2894. Fix involves adjusting CPU feature flags in target/i386/cpu.c."""
mistranslation = "no"

[[issue]]
id = 10001
title = "KVM forces HT flag even when explicitly disabled"
state = "open"
created_at = "2025-04-29T11:02:00.000Z"
closed_at = "n/a"
labels = ["accel: KVM", "target: i386"]
url = "n/a"
host-os = "n/a"
host-arch = "x86"
qemu-version = "10.0.0"
guest-os = "Linux"
guest-arch = "x86"
description = """When launching a guest with '-cpu <model>,-ht --enable-kvm', the HT flag remains enabled in /proc/cpuinfo."""
reproduce = """Launch a KVM guest with HT explicitly disabled and check CPU flags."""
additional = """Separate issue from the TCG warnings. Discussion pending in another thread."""
mistranslation = "no"
Second Thread:
This bug was fixed in the package qemu - 1:9.0.2+ds-4ubuntu5.3

-----------------------------------------------------------------------------------------

qemu (1:9.0.2+ds-4ubuntu5.3) oracular; urgency=medium

  * d/p/u/lp2049698/*: Add full boot order support on s390x (LP: #2049698)
  * Cherry-pick prerequisite for above backport (to avoid FTBFS):
    - d/p/u/lp2049698/0-hw-s390x-sclp.c-include-s390-virtio-ccw.h-to-make.patch
  * d/qemu-system-data.links: symlink s390-netboot.img -> s390-ccw.img for
    backwards compatibility, as the code is now combined.
  * Fix qemu-aarch64-static segfaults running ldconfig.real (LP: #2072564)
    - lp-2072564-01-linux-user-Honor-elf-alignment-when-placing-images.patch
    - lp-2072564-02-elfload-Fix-alignment-when-unmapping-excess-reservat.patch
    Thanks to Dimitry Andric for identifying the fix.

 -- Lukas Märdian <slyon@ubuntu.com>  Thu, 13 Mar 2025 17:18:50 +0100

** Changed in: qemu (Ubuntu Oracular)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/2072564

Title:
  qemu-aarch64-static segfaults running ldconfig.real (amd64 host)

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Noble:
  Fix Committed
Status in qemu source package in Oracular:
  Fix Released

Bug description:
  [ Impact ]

   * QEMU crashes when running (emulating) ldconfig in a Ubuntu 22.04
  arm64 guest

   * This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on
  Ubuntu 24.04+, running on a amd64 host.

   * When running docker containers with Ubuntu 22.04 in them, emulating
  arm64 with qemu-aarch64-static, invocations of ldconfig (actually
  ldconfig.real) segfault, leading to problems when loading shared
  libraries.

  [ Test Plan ]

   * Reproducer is very easy:

  $ sudo snap install docker
  docker 27.5.1 from Canonical** installed
  $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04
  Unable to find image 'ubuntu:22.04' locally
  22.04: Pulling from library/ubuntu
  0d1c17d4e593: Pull complete 
  Digest: 
sha256:ed1544e454989078f5dec1bfdabd8c5cc9c48e0705d07b678ab6ae3fb61952d2
  Status: Downloaded newer image for ubuntu:22.04

  # Execute ldconfig.real inside the arm64 guest.
  # This should not crash after the fix!
  root@ad80af5378dc:/# /sbin/ldconfig.real
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault (core dumped)

  [ Where problems could occur ]

   * This changes the alignment of sections in the ELF binary via QEMUs
  elfloader, if something goes wrong with this change, it could lead to
  all kind of crashes (segfault) of any emulated binaries.

  [ Other Info ]

   * Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913
   * Upstream fix: https://gitlab.com/qemu-project/qemu/-/commit/4b7b20a3
     - Fix dependency (needed for QEMU < 9.20): 
https://gitlab.com/qemu-project/qemu/-/commit/c81d1faf

  --- original bug report ---

  
  This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on Ubuntu 
24.04, running on a amd64 host.

  When running docker containers with Ubuntu 22.04 in them, emulating
  arm64 with qemu-aarch64-static, invocations of ldconfig (actually
  ldconfig.real) segfault. For example:

  $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04
  root@8861ff640a1c:/# /sbin/ldconfig.real
  Segmentation fault

  If you copy the ldconfig.real binary to the host, and run it directly
  via qemu-aarch64-static:

  $ gdb --args qemu-aarch64-static ./ldconfig.real
  GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
  Copyright (C) 2024 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  <https://www.gnu.org/software/gdb/bugs/>.
  Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.

  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from qemu-aarch64-static...
  Reading symbols from 
/home/dim/.cache/debuginfod_client/86579812b213be0964189499f62f176bea817bf2/debuginfo...
  (gdb) r
  Starting program: /usr/bin/qemu-aarch64-static ./ldconfig.real
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  [New Thread 0x7ffff76006c0 (LWP 28378)]

  Thread 1 "qemu-aarch64-st" received signal SIGSEGV, Segmentation fault.
  0x00007fffe801645b in ?? ()
  (gdb) disassemble
  No function contains program counter for selected frame.

  It looks like this is a known qemu regression after v8.1.1:
  https://gitlab.com/qemu-project/qemu/-/issues/1913

  Downgrading the package to qemu-user-
  static_8.0.4+dfsg-1ubuntu3_amd64.deb fixes the segfault.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/2072564/+subscriptions

-----------------------------------------------------------------------------------------

toml
[[issue]]
id = 10002
title = "qemu-aarch64-static segfaults running ldconfig.real (amd64 host)"
state = "closed"
created_at = "n/a"
closed_at = "2025-03-13T17:18:50+01:00"
labels = ["accel: TCG", "target: arm", "ubuntu"]
url = "https://bugs.launchpad.net/bugs/2072564"
host-os = "Ubuntu 24.04+"
host-arch = "amd64"
qemu-version = "1:9.0.2+ds-4ubuntu5.3"
guest-os = "Ubuntu 22.04 (arm64)"
guest-arch = "arm64"
description = """QEMU crashes when running ldconfig in an emulated arm64 guest, affecting qemu-user-static on amd64 hosts."""
reproduce = """Run 'ldconfig.real' in an emulated arm64 container or via qemu-aarch64-static."""
additional = """Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913. Fixed by ELF alignment patches."""
mistranslation = "no"
Third Thread:
Hi, all

When I did the cxl memory hot-plug test on QEMU, I accidentally connected 
two memdev to the same downstream port, the command like below:

> -object memory-backend-ram,size=262144k,share=on,id=vmem0 \
> -object memory-backend-ram,size=262144k,share=on,id=vmem1 \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
> -device cxl-upstream,bus=root_port0,id=us0 \
> -device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \
> -device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \
same downstream port but has different slot!

> -device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \
> -device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \
> -M 
> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k
>  \

There is no error occurred when vm start, but when I executed the “cxl list” 
command to view
the CXL objects info, the process can not end properly.

Then I used strace to trace the process, I found that the process is in 
infinity loop:
# strace cxl list
......
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
write(3, "1\n\0", 3)                    = 3
close(3)                                = 0
access("/run/udev/queue", F_OK)         = 0

[Environment]:
linux: V6.10-rc3
QEMU: V9.0.0
ndctl: v79

I know this is because of the wrong use of the QEMU command, but I think we 
should 
be aware of this error in one of the QEMU, OS or ndctl side at least.

Thanks
Xingtao

-----------------------------------------------------------------------------------------

toml
[[issue]]
id = 10003
title = "CXL memory hot-plug with incorrect configuration causes infinite loop in 'cxl list'"
state = "open"
created_at = "n/a"
closed_at = "n/a"
labels = ["device: cxl", "bug"]
url = "n/a"
host-os = "n/a"
host-arch = "n/a"
qemu-version = "9.0.0"
guest-os = "Linux (v6.10-rc3)"
guest-arch = "n/a"
description = """When incorrectly connecting two memdevs to the same CXL downstream port, 'cxl list' command enters infinite loop."""
reproduce = """1. Configure CXL with two memdevs on same downstream port (different slots)
2. Start VM
3. Run 'cxl list' command"""
additional = """Environment: QEMU v9.0.0, Linux v6.10-rc3, ndctl v79. Issue occurs despite incorrect configuration - should be caught by either QEMU, OS or ndctl."""
mistranslation = "no"
Fourth Thread:
[SRU] migration was active, but no RAM info was set
[Impact]

 * While live-migrating many instances concurrently, libvirt sometimes return `internal error: migration was active, but no RAM info was set:`

 * Effects of this bug are mostly observed in large scale clusters with a lot of live migration activity.

 * Has second order effects for consumers of migration monitor such as libvirt and openstack.

[Test Case]

Synthetic reproducer with GDB in comment #21.

Steps to Reproduce:
1. live evacuate a compute
2. live migration of one or more instances fails with the above error

N.B Due to the nature of this bug it is difficult consistently reproduce.
In an environment where it has been observed it is estimated to occur approximately 1/1000 migrations.

[Where problems could occur]
 * In the event of a regression the migration monitor may report an inconsistent state.

[Original Bug Description]

While live-migrating many instances concurrently, libvirt sometimes return internal error: migration was active, but no RAM info was set:
~~~
2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set
[Impact]

 * While live-migrating many instances concurrently, libvirt sometimes return `internal error: migration was active, but no RAM info was set:`

 * Effects of this bug are mostly observed in large scale clusters with a lot of live migration activity.

 * Has second order effects for consumers of migration monitor such as libvirt and openstack.

[Test Case]

Synthetic reproducer with GDB in comment #21.

Steps to Reproduce:
1. live evacuate a compute
2. live migration of one or more instances fails with the above error

N.B Due to the nature of this bug it is difficult consistently reproduce.
In an environment where it has been observed it is estimated to occur approximately 1/1000 migrations.

[Where problems could occur]
 * In the event of a regression the migration monitor may report an inconsistent state.

[Original Bug Description]

While live-migrating many instances concurrently, libvirt sometimes return internal error: migration was active, but no RAM info was set:
~~~
2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set
toml
[[issue]]
id = 10004
title = "Migration active but no RAM info set during concurrent live migrations"
state = "open"
created_at = "n/a"
closed_at = "n/a"
labels = ["migration", "libvirt", "openstack"]
url = "n/a"
host-os = "n/a"
host-arch = "n/a"
qemu-version = "n/a"
guest-os = "n/a"
guest-arch = "n/a"
description = """During concurrent live migrations, libvirt reports 'migration was active, but no RAM info was set' error intermittently"""
reproduce = """1. Perform live evacuation of compute node
2. Observe failure in approximately 1/1000 migrations with error message
3. Synthetic reproducer available via GDB in comments"""
additional = """Affects large scale clusters with heavy migration activity. Impacts libvirt and OpenStack migration monitoring. Difficult to reproduce consistently."""
mistranslation = "no"