diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-05-21 21:11:14 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-05-21 21:11:14 +0200 |
| commit | aa8bd79cec7bf6790ddb01d156c2ef2201abbaab (patch) | |
| tree | caf78aacbf158bbb8a7b86d2af075f8e5bd93bc1 | |
| parent | aaa3306a1b9e440c4adf9e54fcfd204243b54404 (diff) | |
| download | emulator-bug-study-aa8bd79cec7bf6790ddb01d156c2ef2201abbaab.tar.gz emulator-bug-study-aa8bd79cec7bf6790ddb01d156c2ef2201abbaab.zip | |
add llm prompts with output
| -rw-r--r-- | llm/chatgpt | 553 | ||||
| -rw-r--r-- | llm/deepseek | 656 |
2 files changed, 1209 insertions, 0 deletions
diff --git a/llm/chatgpt b/llm/chatgpt new file mode 100644 index 00000000..b178ca90 --- /dev/null +++ b/llm/chatgpt @@ -0,0 +1,553 @@ +If the tiny code generator of qemu makes an error in the binary translation, when emulating an instruction set architecture, we call it a semantic mistranslation bug. Here are three gitlab issues in the toml format. Tell me, if they describe a semantic mistranslation bug. Answer only with yes and no. + +1. issue: +id = 1371 +title = "x86 BLSMSK semantic bug" +state = "closed" +created_at = "2022-12-16T06:43:29.794Z" +closed_at = "2023-03-01T01:08:38.844Z" +labels = ["Closed::Fixed", "accel: TCG", "target: i386"] +url = "https://gitlab.com/qemu-project/qemu/-/issues/1371" +host-os = "Windows 10 20H2" +host-arch = "x86" +qemu-version = "7.1.90 (v7.2.0-rc0)" +guest-os = "None" +guest-arch = "x86" +description = """The result of instruction BLSMSK is different with from the CPU. The value of CF is different.""" +reproduce = """1. Compile this code +void main() { + asm("mov rax, 0x65b2e276ad27c67"); + asm("mov rbx, 0x62f34955226b2b5d"); + asm("blsmsk eax, ebx"); +} + +2. Execute and compare the result with the CPU. + - CPU + - CF = 0 + - QEMU + - CF = 1""" +additional = """This bug is discovered by research conducted by KAIST SoftSec.""" + +2. issue: +id = 1057 +title = "AArch64: ISV is set to 1 in ESR_EL2 when taking a data abort with post-indexed instructions" +state = "closed" +created_at = "2022-06-02T20:50:04.695Z" +closed_at = "2022-07-18T16:47:38.098Z" +labels = ["accel: TCG", "target: arm", "workflow::In Progress"] +url = "https://gitlab.com/qemu-project/qemu/-/issues/1057" +host-os = "MacOS 12.4" +host-arch = "ARM64" +qemu-version = "QEMU emulator version 7.0.0" +guest-os = "Custom - BedRock Hypervisor running with the NOVA microkernel" +guest-arch = "ARMv8" +description = """I think that I have a Qemu bug in my hands, but, I could still be missing something. Consider the following instruction: +0x0000000000000000: C3 44 00 B8 str w3, [x6], #4 + +notice the last #4, I think this is what we would call a post-indexed instruction (falls into the category of instructions with writeback). As I understand it, those instructions should not have ISV=1 in ESR_EL2 when faulting. + +Here is the relevant part of the manual: + +For other faults reported in ESR_EL2, ISV is 0 except for the following stage 2 aborts: +• AArch64 loads and stores of a single general-purpose register (including the register specified with 0b11111, including those with Acquire/Release semantics, but excluding Load Exclusive or Store Exclusive and excluding those with writeback). + + +However, I can see that Qemu sets ISV to 1 here. The ARM hardware that I tested gave me a value of ISV=0 for similar instructions. + +Another example of instruction: 0x00000000000002f8: 01 1C 40 38 ldrb w1, [x0, #1]!""" +reproduce = """1. Run some hypervisor in EL2 +2. Create a guest running at EL1 that executes one of the mentioned instructions (and make the instruction fault by writing to some unmapped page in SLP) +3. Observe the value of ESR_EL2 on data abort + +Unfortunately, I cannot provide an image to reproduce this (the software is not open-source). But, I would be happy to help test a patch.""" +additional = "n/a" + +3. issue: +id = 2106 +title = "QEMU build fail on Solaris 11.4 because \"FSCALE\" #defined by sys/param.h" +state = "opened" +created_at = "2024-01-16T19:39:42.479Z" +closed_at = "n/a" +labels = ["Build System", "target: arm"] +url = "https://gitlab.com/qemu-project/qemu/-/issues/2106" +host-os = "Solaris 11.4" +host-arch = "sparc" +qemu-version = "n/a" +guest-os = "n/a" +guest-arch = "n/a" +description = """Building target/arm/tcg/translate-sve.c fails on Solaris 11.4 because system's +/usr/include/sys/param.h has #define FSCALE (1 << FSHIFT) which results +in DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn) at translate-sve.c:3864 +attempting to expand the #define substitution instead of the text FSCALE.<p>I have not determined what the sequence of includes was that brought in sys/param.h<p>A workaround is to #undef FSCALE, but that may not be an appropriate long-term fix.""" +reproduce = """1. mkdir build && cd build +2. ../configure --disable-docs --disable-rdma --enable-slirp +3. gmake""" +additional = """Full diagnostic output: +[1865/5402] Compiling C object libqemu-aarch64-softmmu.fa.p/target_arm_tcg_translate-sve.c.o +FAILED: libqemu-aarch64-softmmu.fa.p/target_arm_tcg_translate-sve.c.o +cc -Ilibqemu-aarch64-softmmu.fa.p -I. -I.. -Itarget/arm -I../target/arm -Isubprojects/dtc/libfdt -I../subprojects/dtc/libfdt -Iqapi -Itrace -Iui -Iui/shader -I/usr/include/pixman-1 -I/usr/include/libdrm -I/usr/include/glib-2.0 -I/usr/lib/sparcv9/glib-2.0/include -I/usr/include/pcre -fdiagnostics-color=auto -Wall -Winvalid-pch -std=gnu11 -O2 -g -fstack-protector-strong -Wundef -Wwrite-strings -Wmissing-prototypes -Wstrict-prototypes -Wredundant-decls -Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wmissing-format-attribute -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi -Wshadow=local -iquote . -iquote /opt/qemu -iquote /opt/qemu/include -iquote /opt/qemu/host/include/generic -iquote /opt/qemu/tcg/sparc64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv -D_XOPEN_SOURCE=600 -D__EXTENSIONS__ -fPIE -D_REENTRANT -D_POSIX_PTHREAD_SEMANTICS -DNEED_CPU_H '-DCONFIG_TARGET="aarch64-softmmu-config-target.h"' '-DCONFIG_DEVICES="aarch64-softmmu-config-devices.h"' -MD -MQ libqemu-aarch64-softmmu.fa.p/target_arm_tcg_translate-sve.c.o -MF libqemu-aarch64-softmmu.fa.p/target_arm_tcg_translate-sve.c.o.d -o libqemu-aarch64-softmmu.fa.p/target_arm_tcg_translate-sve.c.o -c ../target/arm/tcg/translate-sve.c +In file included from ../target/arm/tcg/translate-sve.c:21: +../target/arm/tcg/translate.h:728:17: error: pasting "trans_" and "(" does not give a valid preprocessing token + 728 | static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \\ + | ^~~~~~ +../target/arm/tcg/translate-sve.c:3854:5: note: in expansion of macro ‘TRANS_FEAT’ + 3854 | TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a) + | ^~~~~~~~~~ +../target/arm/tcg/translate-sve.c:3864:1: note: in expansion of macro ‘DO_ZPZZ_FP’ + 3864 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn) + | ^~~~~~~~~~ +../target/arm/tcg/translate-sve.c:3864:12: error: expected declaration specifiers or ‘...’ before numeric constant + 3864 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn) + | ^~~~~~ +../target/arm/tcg/translate.h:728:25: note: in definition of macro ‘TRANS_FEAT’ + 728 | static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \\ + | ^~~~ +../target/arm/tcg/translate-sve.c:3864:1: note: in expansion of macro ‘DO_ZPZZ_FP’ + 3864 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn) + | ^~~~~~~~~~ +../target/arm/tcg/translate.h:728:47: error: pasting "arg_" and "(" does not give a valid preprocessing token + 728 | static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \\ + | ^~~~ +../target/arm/tcg/translate-sve.c:3854:5: note: in expansion of macro ‘TRANS_FEAT’ + 3854 | TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a) + | ^~~~~~~~~~ +../target/arm/tcg/translate-sve.c:3864:1: note: in expansion of macro ‘DO_ZPZZ_FP’ + 3864 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn) + | ^~~~~~~~~~ +In file included from ../target/arm/tcg/translate-sve.c:86: +libqemu-aarch64-softmmu.fa.p/decode-sve.c.inc:1112:13: warning: ‘trans_FSCALE’ used but never defined + 1112 | static bool trans_FSCALE(DisasContext *ctx, arg_FSCALE *a); + | ^~~~~~~~~~~~ +../target/arm/tcg/translate-sve.c:3864:30: warning: ‘sve_fscalbn_zpzz_fns’ defined but not used [-Wunused-const-variable=] + 3864 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn) + | ^~~~~~~~~~~ +../target/arm/tcg/translate-sve.c:3850:42: note: in definition of macro ‘DO_ZPZZ_FP’ + 3850 | static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = { \\ + | ^~~~ +""" + + +ChatGPT said: +Yes + +Yes + +No + +You said: +Now I will give you Mailing Threads. Transform them into the toml format i provided you, which should be the only output. If a variable is not given in the mails, write "n/a". The id's should be 10000 and upwards. the state is closed, when the bug is fixed, and open, when the bug is not fixed. First Thread: + +This bug was fixed in the package qemu - 1:9.0.2+ds-4ubuntu5.3 + +--------------- +qemu (1:9.0.2+ds-4ubuntu5.3) oracular; urgency=medium + + * d/p/u/lp2049698/*: Add full boot order support on s390x (LP: #2049698) + * Cherry-pick prerequisite for above backport (to avoid FTBFS): + - d/p/u/lp2049698/0-hw-s390x-sclp.c-include-s390-virtio-ccw.h-to-make.patch + * d/qemu-system-data.links: symlink s390-netboot.img -> s390-ccw.img for + backwards compatibility, as the code is now combined. + * Fix qemu-aarch64-static segfaults running ldconfig.real (LP: #2072564) + - lp-2072564-01-linux-user-Honor-elf-alignment-when-placing-images.patch + - lp-2072564-02-elfload-Fix-alignment-when-unmapping-excess-reservat.patch + Thanks to Dimitry Andric for identifying the fix. + + -- Lukas Märdian <slyon@ubuntu.com> Thu, 13 Mar 2025 17:18:50 +0100 + +** Changed in: qemu (Ubuntu Oracular) + Status: Fix Committed => Fix Released + +-- +You received this bug notification because you are a member of qemu- +devel-ml, which is subscribed to QEMU. +https://bugs.launchpad.net/bugs/2072564 + +Title: + qemu-aarch64-static segfaults running ldconfig.real (amd64 host) + +Status in QEMU: + Fix Released +Status in qemu package in Ubuntu: + Fix Released +Status in qemu source package in Noble: + Fix Committed +Status in qemu source package in Oracular: + Fix Released + +Bug description: + [ Impact ] + + * QEMU crashes when running (emulating) ldconfig in a Ubuntu 22.04 + arm64 guest + + * This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on + Ubuntu 24.04+, running on a amd64 host. + + * When running docker containers with Ubuntu 22.04 in them, emulating + arm64 with qemu-aarch64-static, invocations of ldconfig (actually + ldconfig.real) segfault, leading to problems when loading shared + libraries. + + [ Test Plan ] + + * Reproducer is very easy: + + $ sudo snap install docker + docker 27.5.1 from Canonical** installed + $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04 + Unable to find image 'ubuntu:22.04' locally + 22.04: Pulling from library/ubuntu + 0d1c17d4e593: Pull complete + Digest: +sha256:ed1544e454989078f5dec1bfdabd8c5cc9c48e0705d07b678ab6ae3fb61952d2 + Status: Downloaded newer image for ubuntu:22.04 + + # Execute ldconfig.real inside the arm64 guest. + # This should not crash after the fix! + root@ad80af5378dc:/# /sbin/ldconfig.real + qemu: uncaught target signal 11 (Segmentation fault) - core dumped + Segmentation fault (core dumped) + + [ Where problems could occur ] + + * This changes the alignment of sections in the ELF binary via QEMUs + elfloader, if something goes wrong with this change, it could lead to + all kind of crashes (segfault) of any emulated binaries. + + [ Other Info ] + + * Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913 + * Upstream fix: https://gitlab.com/qemu-project/qemu/-/commit/4b7b20a3 + - Fix dependency (needed for QEMU < 9.20): +https://gitlab.com/qemu-project/qemu/-/commit/c81d1faf + + --- original bug report --- + + + This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on Ubuntu +24.04, running on a amd64 host. + + When running docker containers with Ubuntu 22.04 in them, emulating + arm64 with qemu-aarch64-static, invocations of ldconfig (actually + ldconfig.real) segfault. For example: + + $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04 + root@8861ff640a1c:/# /sbin/ldconfig.real + Segmentation fault + + If you copy the ldconfig.real binary to the host, and run it directly + via qemu-aarch64-static: + + $ gdb --args qemu-aarch64-static ./ldconfig.real + GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git + Copyright (C) 2024 Free Software Foundation, Inc. + License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> + This is free software: you are free to change and redistribute it. + There is NO WARRANTY, to the extent permitted by law. + Type "show copying" and "show warranty" for details. + This GDB was configured as "x86_64-linux-gnu". + Type "show configuration" for configuration details. + For bug reporting instructions, please see: + <https://www.gnu.org/software/gdb/bugs/>. + Find the GDB manual and other documentation resources online at: + <http://www.gnu.org/software/gdb/documentation/>. + + For help, type "help". + Type "apropos word" to search for commands related to "word"... + Reading symbols from qemu-aarch64-static... + Reading symbols from +/home/dim/.cache/debuginfod_client/86579812b213be0964189499f62f176bea817bf2/debuginfo... + (gdb) r + Starting program: /usr/bin/qemu-aarch64-static ./ldconfig.real + [Thread debugging using libthread_db enabled] + Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". + [New Thread 0x7ffff76006c0 (LWP 28378)] + + Thread 1 "qemu-aarch64-st" received signal SIGSEGV, Segmentation fault. + 0x00007fffe801645b in ?? () + (gdb) disassemble + No function contains program counter for selected frame. + + It looks like this is a known qemu regression after v8.1.1: + https://gitlab.com/qemu-project/qemu/-/issues/1913 + + Downgrading the package to qemu-user- + static_8.0.4+dfsg-1ubuntu3_amd64.deb fixes the segfault. + +To manage notifications about this bug go to: +https://bugs.launchpad.net/qemu/+bug/2072564/+subscriptions + + +ChatGPT said: +toml +Copy +Edit +id = 10000 +title = "qemu-aarch64-static segfaults running ldconfig.real (amd64 host)" +state = "closed" +created_at = "n/a" +closed_at = "2025-03-13T17:18:50+01:00" +labels = ["accel: TCG", "target: arm", "linux-user", "elfload"] +url = "https://bugs.launchpad.net/qemu/+bug/2072564" +host-os = "Ubuntu 24.04" +host-arch = "amd64" +qemu-version = "1:8.2.2+ds-0ubuntu1" +guest-os = "Ubuntu 22.04" +guest-arch = "arm64" +description = """QEMU crashes when running ldconfig.real in an Ubuntu 22.04 arm64 guest under emulation with qemu-aarch64-static on an amd64 host. The issue is due to incorrect alignment handling in the ELF loader, resulting in a segmentation fault.""" +reproduce = """1. Run a docker container with Ubuntu 22.04 on an amd64 host using `--platform linux/arm64/v8`. +2. Execute `/sbin/ldconfig.real` inside the container. +3. Observe the segmentation fault.""" +additional = """Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913 +Upstream fix: https://gitlab.com/qemu-project/qemu/-/commit/4b7b20a3 +Dependency fix: https://gitlab.com/qemu-project/qemu/-/commit/c81d1faf""" + + + + + + +You said: +Do the same for the next thread. Additionally, add the variable 'mistranslation' with the values 'yes' or 'no', depending on whether the bug is a semantic mistranslation. + +On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote: +> Date: Tue, 29 Apr 2025 13:55:59 +0800 +> From: Xiaoyao Li <xiaoyao.li@intel.com> +> Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / +> CMP_LEG bits +> +> On 4/29/2025 11:02 AM, Ewan Hai wrote: +> > Hi Community, +> > +> > This email contains 3 bugs appear to share the same root cause. +> > +> > [1] We ran into the following warnings when running QEMU v10.0.0 in TCG +> > mode: +> > +> > qemu-system-x86_64 \ +> > -machine q35 \ +> > -m 4G -smp 4 \ +> > -kernel ./arch/x86/boot/bzImage \ +> > -bios /usr/share/ovmf/OVMF.fd \ +> > -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ +> > -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ +> > -nographic \ +> > -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +> > +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.01H:EDX.ht [bit 28] +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.80000001H:ECX.cmp-legacy [bit 1] +> > (repeats 4 times, once per vCPU) +> > +> > Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up +> > CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is +> > what introduced the warnings. +> > +> > Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) +> > and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT +> > support, these bits trigger the warnings above. +> > +> > [2] Also, Zhao pointed me to a similar report on GitLab: +> > https://gitlab.com/qemu-project/qemu/-/issues/2894 +> > The symptoms there look identical to what we're seeing. +> > +> > By convention we file one issue per email, but these two appear to share +> > the same root cause, so I'm describing them together here. +> +> It was caused by my two patches. I think the fix can be as follow. +> If no objection from the community, I can submit the formal patch. +> +> diff --git a/target/i386/cpu.c b/target/i386/cpu.c +> index 1f970aa4daa6..fb95aadd6161 100644 +> --- a/target/i386/cpu.c +> +++ b/target/i386/cpu.c +> @@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \ +> CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \ +> CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \ +> - CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE) +> + CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \ +> + CPUID_HT) +> /* partly implemented: +> CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */ +> /* missing: +> - CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */ +> + CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */ +> +> /* +> * Kernel-only features that can be shown to usermode programs even if +> @@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> +> #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ +> CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \ +> - CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES) +> + CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \ +> + CPUID_EXT3_CMP_LEG) +> +> #define TCG_EXT4_FEATURES 0 + +This fix is fine for me...at least from SDM, HTT depends on topology and +it should exist when user sets "-smp 4". + +> > [3] My colleague Alan noticed what appears to be a related problem: if +> > we launch a guest with '-cpu <model>,-ht --enable-kvm', which means +> > explicitly removing the ht flag, but the guest still reports HT(cat +> > /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht +> > bit seems to be forced on even when the user tries to disable it. +> +> XiaoYao reminded me that issue [3] stems from a different patch. Please +> ignore it for now—I'll start a separate thread to discuss that one +> independently. + +I haven't found any other thread :-). + +By the way, just curious, in what cases do you need to disbale the HT +flag? "-smp 4" means 4 cores with 1 thread per core, and is it not +enough? + +As for the “-ht” behavior, I'm also unsure whether this should be fixed +or not - one possible consideration is whether “-ht” would be useful. + +On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote: +> Date: Tue, 29 Apr 2025 13:55:59 +0800 +> From: Xiaoyao Li <xiaoyao.li@intel.com> +> Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / +> CMP_LEG bits +> +> On 4/29/2025 11:02 AM, Ewan Hai wrote: +> > Hi Community, +> > +> > This email contains 3 bugs appear to share the same root cause. +> > +> > [1] We ran into the following warnings when running QEMU v10.0.0 in TCG +> > mode: +> > +> > qemu-system-x86_64 \ +> > -machine q35 \ +> > -m 4G -smp 4 \ +> > -kernel ./arch/x86/boot/bzImage \ +> > -bios /usr/share/ovmf/OVMF.fd \ +> > -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ +> > -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ +> > -nographic \ +> > -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +> > +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.01H:EDX.ht [bit 28] +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.80000001H:ECX.cmp-legacy [bit 1] +> > (repeats 4 times, once per vCPU) +> > +> > Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up +> > CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is +> > what introduced the warnings. +> > +> > Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) +> > and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT +> > support, these bits trigger the warnings above. +> > +> > [2] Also, Zhao pointed me to a similar report on GitLab: +> > https://gitlab.com/qemu-project/qemu/-/issues/2894 +> > The symptoms there look identical to what we're seeing. +> > +> > By convention we file one issue per email, but these two appear to share +> > the same root cause, so I'm describing them together here. +> +> It was caused by my two patches. I think the fix can be as follow. +> If no objection from the community, I can submit the formal patch. +> +> diff --git a/target/i386/cpu.c b/target/i386/cpu.c +> index 1f970aa4daa6..fb95aadd6161 100644 +> --- a/target/i386/cpu.c +> +++ b/target/i386/cpu.c +> @@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \ +> CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \ +> CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \ +> - CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE) +> + CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \ +> + CPUID_HT) +> /* partly implemented: +> CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */ +> /* missing: +> - CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */ +> + CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */ +> +> /* +> * Kernel-only features that can be shown to usermode programs even if +> @@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> +> #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ +> CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \ +> - CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES) +> + CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \ +> + CPUID_EXT3_CMP_LEG) +> +> #define TCG_EXT4_FEATURES 0 + +This fix is fine for me...at least from SDM, HTT depends on topology and +it should exist when user sets "-smp 4". + +> > [3] My colleague Alan noticed what appears to be a related problem: if +> > we launch a guest with '-cpu <model>,-ht --enable-kvm', which means +> > explicitly removing the ht flag, but the guest still reports HT(cat +> > /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht +> > bit seems to be forced on even when the user tries to disable it. +> +> XiaoYao reminded me that issue [3] stems from a different patch. Please +> ignore it for now—I'll start a separate thread to discuss that one +> independently. + +I haven't found any other thread :-). + +By the way, just curious, in what cases do you need to disbale the HT +flag? "-smp 4" means 4 cores with 1 thread per core, and is it not +enough? + +As for the “-ht” behavior, I'm also unsure whether this should be fixed +or not - one possible consideration is whether “-ht” would be useful. + + +ChatGPT said: +Here’s a structured JSON extraction of the bug information from the email thread, with the new "mistranslation" field added: + +json +Copy +Edit +[ + { + "id": 1, + "mode": "TCG", + "cpu_flags": ["CPUID.01H:EDX.ht [bit 28]", "CPUID.80000001H:ECX.cmp-legacy [bit 1]"], + "symptom": "QEMU shows warnings that TCG doesn't support requested features HTT and CMP_LEG", + "introduced_by_commit": "c6bd2dd63420", + "cause": "HTT and CMP_LEG are now unconditionally advertised by TCG after the commit, but TCG has no SMT support", + "related_issues": ["https://gitlab.com/qemu-project/qemu/-/issues/2894"], + "fix_proposed": true, + "fix_details": "Patch to conditionally advertise HTT and CMP_LEG features based on capability", + "mistranslation": "no" + }, + { + "id": 2, + "mode": "KVM", + "cpu_flags": ["-ht"], + "symptom": "Guest reports HT enabled even when the ht flag is explicitly disabled", + "introduced_by_commit": null, + "cause": "Different patch (not detailed in this thread); HT bit appears forced on under KVM", + "related_issues": [], + "fix_proposed": false, + "fix_details": "Will be discussed in a separate thread", + "mistranslation": "yes" + } +] +Explanation of "mistranslation" field: +Bug 1 is not a semantic mistranslation—it’s a configuration logic error where unsupported features are incorrectly exposed. Hence, "mistranslation": "no". + +Bug 2 is a semantic mistranslation—the software allows the user to remove a flag (-ht) but internally contradicts that request by still enabling the feature. This mismatch between user intent and actual behavior is a semantic mistranslation. Hence, "mistranslation": "yes". + +Let me know if you want these grouped or exported in another format. diff --git a/llm/deepseek b/llm/deepseek new file mode 100644 index 00000000..674b9d60 --- /dev/null +++ b/llm/deepseek @@ -0,0 +1,656 @@ +If the tiny code generator of qemu makes an error in the binary translation, when emulating an instruction set architecture, we call it a semantic mistranslation bug. Here are three gitlab issues in the toml format. Tell me, if they describe a semantic mistranslation bug. Answer only with yes and no. + +1. issue: +id = 1371 +title = "x86 BLSMSK semantic bug" +state = "closed" +created_at = "2022-12-16T06:43:29.794Z" +closed_at = "2023-03-01T01:08:38.844Z" +labels = ["Closed::Fixed", "accel: TCG", "target: i386"] +url = "https://gitlab.com/qemu-project/qemu/-/issues/1371" +host-os = "Windows 10 20H2" +host-arch = "x86" +qemu-version = "7.1.90 (v7.2.0-rc0)" +guest-os = "None" +guest-arch = "x86" +description = """The result of instruction BLSMSK is different with from the CPU. The value of CF is different.""" +reproduce = """1. Compile this code +void main() { + asm("mov rax, 0x65b2e276ad27c67"); + asm("mov rbx, 0x62f34955226b2b5d"); + asm("blsmsk eax, ebx"); +} + +2. Execute and compare the result with the CPU. + - CPU + - CF = 0 + - QEMU + - CF = 1""" +additional = """This bug is discovered by research conducted by KAIST SoftSec.""" + +2. issue: +id = 1057 +title = "AArch64: ISV is set to 1 in ESR_EL2 when taking a data abort with post-indexed instructions" +state = "closed" +created_at = "2022-06-02T20:50:04.695Z" +closed_at = "2022-07-18T16:47:38.098Z" +labels = ["accel: TCG", "target: arm", "workflow::In Progress"] +url = "https://gitlab.com/qemu-project/qemu/-/issues/1057" +host-os = "MacOS 12.4" +host-arch = "ARM64" +qemu-version = "QEMU emulator version 7.0.0" +guest-os = "Custom - BedRock Hypervisor running with the NOVA microkernel" +guest-arch = "ARMv8" +description = """I think that I have a Qemu bug in my hands, but, I could still be missing something. Consider the following instruction: +0x0000000000000000: C3 44 00 B8 str w3, [x6], #4 + +notice the last #4, I think this is what we would call a post-indexed instruction (falls into the category of instructions with writeback). As I understand it, those instructions should not have ISV=1 in ESR_EL2 when faulting. + +Here is the relevant part of the manual: + +For other faults reported in ESR_EL2, ISV is 0 except for the following stage 2 aborts: +• AArch64 loads and stores of a single general-purpose register (including the register specified with 0b11111, including those with Acquire/Release semantics, but excluding Load Exclusive or Store Exclusive and excluding those with writeback). + + +However, I can see that Qemu sets ISV to 1 here. The ARM hardware that I tested gave me a value of ISV=0 for similar instructions. + +Another example of instruction: 0x00000000000002f8: 01 1C 40 38 ldrb w1, [x0, #1]!""" +reproduce = """1. Run some hypervisor in EL2 +2. Create a guest running at EL1 that executes one of the mentioned instructions (and make the instruction fault by writing to some unmapped page in SLP) +3. Observe the value of ESR_EL2 on data abort + +Unfortunately, I cannot provide an image to reproduce this (the software is not open-source). But, I would be happy to help test a patch.""" +additional = "n/a" + +----------------------------------------------------------------------------------------- + +yes + +no + +----------------------------------------------------------------------------------------- + +Now I will give you Mailing Threads. Transform them into the toml format i provided you, which should be the only output. If a variable is not given in the mails, write "n/a". The id's should be 10000 and upwards. the state is closed, when the bug is fixed, and open, when the bug is not fixed. Additionally, add a 'mistranslation' variable, which is set to 'yes', if the bug is a mistranslation bug, and 'no', if not. First Thread: + +On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote: +> Date: Tue, 29 Apr 2025 13:55:59 +0800 +> From: Xiaoyao Li <xiaoyao.li@intel.com> +> Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / +> CMP_LEG bits +> +> On 4/29/2025 11:02 AM, Ewan Hai wrote: +> > Hi Community, +> > +> > This email contains 3 bugs appear to share the same root cause. +> > +> > [1] We ran into the following warnings when running QEMU v10.0.0 in TCG +> > mode: +> > +> > qemu-system-x86_64 \ +> > -machine q35 \ +> > -m 4G -smp 4 \ +> > -kernel ./arch/x86/boot/bzImage \ +> > -bios /usr/share/ovmf/OVMF.fd \ +> > -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ +> > -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ +> > -nographic \ +> > -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +> > +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.01H:EDX.ht [bit 28] +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.80000001H:ECX.cmp-legacy [bit 1] +> > (repeats 4 times, once per vCPU) +> > +> > Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up +> > CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is +> > what introduced the warnings. +> > +> > Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) +> > and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT +> > support, these bits trigger the warnings above. +> > +> > [2] Also, Zhao pointed me to a similar report on GitLab: +> > https://gitlab.com/qemu-project/qemu/-/issues/2894 +> > The symptoms there look identical to what we're seeing. +> > +> > By convention we file one issue per email, but these two appear to share +> > the same root cause, so I'm describing them together here. +> +> It was caused by my two patches. I think the fix can be as follow. +> If no objection from the community, I can submit the formal patch. +> +> diff --git a/target/i386/cpu.c b/target/i386/cpu.c +> index 1f970aa4daa6..fb95aadd6161 100644 +> --- a/target/i386/cpu.c +> +++ b/target/i386/cpu.c +> @@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \ +> CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \ +> CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \ +> - CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE) +> + CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \ +> + CPUID_HT) +> /* partly implemented: +> CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */ +> /* missing: +> - CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */ +> + CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */ +> +> /* +> * Kernel-only features that can be shown to usermode programs even if +> @@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> +> #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ +> CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \ +> - CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES) +> + CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \ +> + CPUID_EXT3_CMP_LEG) +> +> #define TCG_EXT4_FEATURES 0 + +This fix is fine for me...at least from SDM, HTT depends on topology and +it should exist when user sets "-smp 4". + +> > [3] My colleague Alan noticed what appears to be a related problem: if +> > we launch a guest with '-cpu <model>,-ht --enable-kvm', which means +> > explicitly removing the ht flag, but the guest still reports HT(cat +> > /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht +> > bit seems to be forced on even when the user tries to disable it. +> +> XiaoYao reminded me that issue [3] stems from a different patch. Please +> ignore it for now—I'll start a separate thread to discuss that one +> independently. + +I haven't found any other thread :-). + +By the way, just curious, in what cases do you need to disbale the HT +flag? "-smp 4" means 4 cores with 1 thread per core, and is it not +enough? + +As for the “-ht” behavior, I'm also unsure whether this should be fixed +or not - one possible consideration is whether “-ht” would be useful. + +On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote: +> Date: Tue, 29 Apr 2025 13:55:59 +0800 +> From: Xiaoyao Li <xiaoyao.li@intel.com> +> Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / +> CMP_LEG bits +> +> On 4/29/2025 11:02 AM, Ewan Hai wrote: +> > Hi Community, +> > +> > This email contains 3 bugs appear to share the same root cause. +> > +> > [1] We ran into the following warnings when running QEMU v10.0.0 in TCG +> > mode: +> > +> > qemu-system-x86_64 \ +> > -machine q35 \ +> > -m 4G -smp 4 \ +> > -kernel ./arch/x86/boot/bzImage \ +> > -bios /usr/share/ovmf/OVMF.fd \ +> > -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ +> > -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ +> > -nographic \ +> > -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +> > +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.01H:EDX.ht [bit 28] +> > qemu-system-x86_64: warning: TCG doesn't support requested feature: +> > CPUID.80000001H:ECX.cmp-legacy [bit 1] +> > (repeats 4 times, once per vCPU) +> > +> > Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up +> > CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is +> > what introduced the warnings. +> > +> > Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) +> > and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT +> > support, these bits trigger the warnings above. +> > +> > [2] Also, Zhao pointed me to a similar report on GitLab: +> > https://gitlab.com/qemu-project/qemu/-/issues/2894 +> > The symptoms there look identical to what we're seeing. +> > +> > By convention we file one issue per email, but these two appear to share +> > the same root cause, so I'm describing them together here. +> +> It was caused by my two patches. I think the fix can be as follow. +> If no objection from the community, I can submit the formal patch. +> +> diff --git a/target/i386/cpu.c b/target/i386/cpu.c +> index 1f970aa4daa6..fb95aadd6161 100644 +> --- a/target/i386/cpu.c +> +++ b/target/i386/cpu.c +> @@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \ +> CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \ +> CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \ +> - CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE) +> + CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \ +> + CPUID_HT) +> /* partly implemented: +> CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */ +> /* missing: +> - CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */ +> + CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */ +> +> /* +> * Kernel-only features that can be shown to usermode programs even if +> @@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> vendor1, +> +> #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ +> CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \ +> - CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES) +> + CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \ +> + CPUID_EXT3_CMP_LEG) +> +> #define TCG_EXT4_FEATURES 0 + +This fix is fine for me...at least from SDM, HTT depends on topology and +it should exist when user sets "-smp 4". + +> > [3] My colleague Alan noticed what appears to be a related problem: if +> > we launch a guest with '-cpu <model>,-ht --enable-kvm', which means +> > explicitly removing the ht flag, but the guest still reports HT(cat +> > /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht +> > bit seems to be forced on even when the user tries to disable it. +> +> XiaoYao reminded me that issue [3] stems from a different patch. Please +> ignore it for now—I'll start a separate thread to discuss that one +> independently. + +I haven't found any other thread :-). + +By the way, just curious, in what cases do you need to disbale the HT +flag? "-smp 4" means 4 cores with 1 thread per core, and is it not +enough? + +As for the “-ht” behavior, I'm also unsure whether this should be fixed +or not - one possible consideration is whether “-ht” would be useful. + +----------------------------------------------------------------------------------------- + +toml +[[issue]] +id = 10000 +title = "QEMU TCG warnings after commit c6bd2dd63420 - HTT / CMP_LEG bits" +state = "closed" +created_at = "2025-04-29T11:02:00.000Z" +closed_at = "n/a" +labels = ["accel: TCG", "target: i386"] +url = "n/a" +host-os = "n/a" +host-arch = "x86" +qemu-version = "10.0.0" +guest-os = "Linux" +guest-arch = "x86" +description = """TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and CMP_LEG (CPUID 8000_0001 ECX[1]) despite lacking SMT support, causing warnings.""" +reproduce = """Run QEMU in TCG mode with a Linux guest and observe CPUID-related warnings.""" +additional = """Related to GitLab issue #2894. Fix involves adjusting CPU feature flags in target/i386/cpu.c.""" +mistranslation = "no" + +[[issue]] +id = 10001 +title = "KVM forces HT flag even when explicitly disabled" +state = "open" +created_at = "2025-04-29T11:02:00.000Z" +closed_at = "n/a" +labels = ["accel: KVM", "target: i386"] +url = "n/a" +host-os = "n/a" +host-arch = "x86" +qemu-version = "10.0.0" +guest-os = "Linux" +guest-arch = "x86" +description = """When launching a guest with '-cpu <model>,-ht --enable-kvm', the HT flag remains enabled in /proc/cpuinfo.""" +reproduce = """Launch a KVM guest with HT explicitly disabled and check CPU flags.""" +additional = """Separate issue from the TCG warnings. Discussion pending in another thread.""" +mistranslation = "no" +Second Thread: +This bug was fixed in the package qemu - 1:9.0.2+ds-4ubuntu5.3 + +----------------------------------------------------------------------------------------- + +qemu (1:9.0.2+ds-4ubuntu5.3) oracular; urgency=medium + + * d/p/u/lp2049698/*: Add full boot order support on s390x (LP: #2049698) + * Cherry-pick prerequisite for above backport (to avoid FTBFS): + - d/p/u/lp2049698/0-hw-s390x-sclp.c-include-s390-virtio-ccw.h-to-make.patch + * d/qemu-system-data.links: symlink s390-netboot.img -> s390-ccw.img for + backwards compatibility, as the code is now combined. + * Fix qemu-aarch64-static segfaults running ldconfig.real (LP: #2072564) + - lp-2072564-01-linux-user-Honor-elf-alignment-when-placing-images.patch + - lp-2072564-02-elfload-Fix-alignment-when-unmapping-excess-reservat.patch + Thanks to Dimitry Andric for identifying the fix. + + -- Lukas Märdian <slyon@ubuntu.com> Thu, 13 Mar 2025 17:18:50 +0100 + +** Changed in: qemu (Ubuntu Oracular) + Status: Fix Committed => Fix Released + +-- +You received this bug notification because you are a member of qemu- +devel-ml, which is subscribed to QEMU. +https://bugs.launchpad.net/bugs/2072564 + +Title: + qemu-aarch64-static segfaults running ldconfig.real (amd64 host) + +Status in QEMU: + Fix Released +Status in qemu package in Ubuntu: + Fix Released +Status in qemu source package in Noble: + Fix Committed +Status in qemu source package in Oracular: + Fix Released + +Bug description: + [ Impact ] + + * QEMU crashes when running (emulating) ldconfig in a Ubuntu 22.04 + arm64 guest + + * This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on + Ubuntu 24.04+, running on a amd64 host. + + * When running docker containers with Ubuntu 22.04 in them, emulating + arm64 with qemu-aarch64-static, invocations of ldconfig (actually + ldconfig.real) segfault, leading to problems when loading shared + libraries. + + [ Test Plan ] + + * Reproducer is very easy: + + $ sudo snap install docker + docker 27.5.1 from Canonical** installed + $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04 + Unable to find image 'ubuntu:22.04' locally + 22.04: Pulling from library/ubuntu + 0d1c17d4e593: Pull complete + Digest: +sha256:ed1544e454989078f5dec1bfdabd8c5cc9c48e0705d07b678ab6ae3fb61952d2 + Status: Downloaded newer image for ubuntu:22.04 + + # Execute ldconfig.real inside the arm64 guest. + # This should not crash after the fix! + root@ad80af5378dc:/# /sbin/ldconfig.real + qemu: uncaught target signal 11 (Segmentation fault) - core dumped + Segmentation fault (core dumped) + + [ Where problems could occur ] + + * This changes the alignment of sections in the ELF binary via QEMUs + elfloader, if something goes wrong with this change, it could lead to + all kind of crashes (segfault) of any emulated binaries. + + [ Other Info ] + + * Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913 + * Upstream fix: https://gitlab.com/qemu-project/qemu/-/commit/4b7b20a3 + - Fix dependency (needed for QEMU < 9.20): +https://gitlab.com/qemu-project/qemu/-/commit/c81d1faf + + --- original bug report --- + + + This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on Ubuntu +24.04, running on a amd64 host. + + When running docker containers with Ubuntu 22.04 in them, emulating + arm64 with qemu-aarch64-static, invocations of ldconfig (actually + ldconfig.real) segfault. For example: + + $ docker run -ti --platform linux/arm64/v8 ubuntu:22.04 + root@8861ff640a1c:/# /sbin/ldconfig.real + Segmentation fault + + If you copy the ldconfig.real binary to the host, and run it directly + via qemu-aarch64-static: + + $ gdb --args qemu-aarch64-static ./ldconfig.real + GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git + Copyright (C) 2024 Free Software Foundation, Inc. + License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> + This is free software: you are free to change and redistribute it. + There is NO WARRANTY, to the extent permitted by law. + Type "show copying" and "show warranty" for details. + This GDB was configured as "x86_64-linux-gnu". + Type "show configuration" for configuration details. + For bug reporting instructions, please see: + <https://www.gnu.org/software/gdb/bugs/>. + Find the GDB manual and other documentation resources online at: + <http://www.gnu.org/software/gdb/documentation/>. + + For help, type "help". + Type "apropos word" to search for commands related to "word"... + Reading symbols from qemu-aarch64-static... + Reading symbols from +/home/dim/.cache/debuginfod_client/86579812b213be0964189499f62f176bea817bf2/debuginfo... + (gdb) r + Starting program: /usr/bin/qemu-aarch64-static ./ldconfig.real + [Thread debugging using libthread_db enabled] + Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". + [New Thread 0x7ffff76006c0 (LWP 28378)] + + Thread 1 "qemu-aarch64-st" received signal SIGSEGV, Segmentation fault. + 0x00007fffe801645b in ?? () + (gdb) disassemble + No function contains program counter for selected frame. + + It looks like this is a known qemu regression after v8.1.1: + https://gitlab.com/qemu-project/qemu/-/issues/1913 + + Downgrading the package to qemu-user- + static_8.0.4+dfsg-1ubuntu3_amd64.deb fixes the segfault. + +To manage notifications about this bug go to: +https://bugs.launchpad.net/qemu/+bug/2072564/+subscriptions + +----------------------------------------------------------------------------------------- + +toml +[[issue]] +id = 10002 +title = "qemu-aarch64-static segfaults running ldconfig.real (amd64 host)" +state = "closed" +created_at = "n/a" +closed_at = "2025-03-13T17:18:50+01:00" +labels = ["accel: TCG", "target: arm", "ubuntu"] +url = "https://bugs.launchpad.net/bugs/2072564" +host-os = "Ubuntu 24.04+" +host-arch = "amd64" +qemu-version = "1:9.0.2+ds-4ubuntu5.3" +guest-os = "Ubuntu 22.04 (arm64)" +guest-arch = "arm64" +description = """QEMU crashes when running ldconfig in an emulated arm64 guest, affecting qemu-user-static on amd64 hosts.""" +reproduce = """Run 'ldconfig.real' in an emulated arm64 container or via qemu-aarch64-static.""" +additional = """Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913. Fixed by ELF alignment patches.""" +mistranslation = "no" +Third Thread: +Hi, all + +When I did the cxl memory hot-plug test on QEMU, I accidentally connected +two memdev to the same downstream port, the command like below: + +> -object memory-backend-ram,size=262144k,share=on,id=vmem0 \ +> -object memory-backend-ram,size=262144k,share=on,id=vmem1 \ +> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +> -device cxl-upstream,bus=root_port0,id=us0 \ +> -device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \ +> -device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \ +same downstream port but has different slot! + +> -device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \ +> -device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \ +> -M +> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k +> \ + +There is no error occurred when vm start, but when I executed the “cxl list” +command to view +the CXL objects info, the process can not end properly. + +Then I used strace to trace the process, I found that the process is in +infinity loop: +# strace cxl list +...... +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 + +[Environment]: +linux: V6.10-rc3 +QEMU: V9.0.0 +ndctl: v79 + +I know this is because of the wrong use of the QEMU command, but I think we +should +be aware of this error in one of the QEMU, OS or ndctl side at least. + +Thanks +Xingtao + +----------------------------------------------------------------------------------------- + +toml +[[issue]] +id = 10003 +title = "CXL memory hot-plug with incorrect configuration causes infinite loop in 'cxl list'" +state = "open" +created_at = "n/a" +closed_at = "n/a" +labels = ["device: cxl", "bug"] +url = "n/a" +host-os = "n/a" +host-arch = "n/a" +qemu-version = "9.0.0" +guest-os = "Linux (v6.10-rc3)" +guest-arch = "n/a" +description = """When incorrectly connecting two memdevs to the same CXL downstream port, 'cxl list' command enters infinite loop.""" +reproduce = """1. Configure CXL with two memdevs on same downstream port (different slots) +2. Start VM +3. Run 'cxl list' command""" +additional = """Environment: QEMU v9.0.0, Linux v6.10-rc3, ndctl v79. Issue occurs despite incorrect configuration - should be caught by either QEMU, OS or ndctl.""" +mistranslation = "no" +Fourth Thread: +[SRU] migration was active, but no RAM info was set +[Impact] + + * While live-migrating many instances concurrently, libvirt sometimes return `internal error: migration was active, but no RAM info was set:` + + * Effects of this bug are mostly observed in large scale clusters with a lot of live migration activity. + + * Has second order effects for consumers of migration monitor such as libvirt and openstack. + +[Test Case] + +Synthetic reproducer with GDB in comment #21. + +Steps to Reproduce: +1. live evacuate a compute +2. live migration of one or more instances fails with the above error + +N.B Due to the nature of this bug it is difficult consistently reproduce. +In an environment where it has been observed it is estimated to occur approximately 1/1000 migrations. + +[Where problems could occur] + * In the event of a regression the migration monitor may report an inconsistent state. + +[Original Bug Description] + +While live-migrating many instances concurrently, libvirt sometimes return internal error: migration was active, but no RAM info was set: +~~~ +2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set +[Impact] + + * While live-migrating many instances concurrently, libvirt sometimes return `internal error: migration was active, but no RAM info was set:` + + * Effects of this bug are mostly observed in large scale clusters with a lot of live migration activity. + + * Has second order effects for consumers of migration monitor such as libvirt and openstack. + +[Test Case] + +Synthetic reproducer with GDB in comment #21. + +Steps to Reproduce: +1. live evacuate a compute +2. live migration of one or more instances fails with the above error + +N.B Due to the nature of this bug it is difficult consistently reproduce. +In an environment where it has been observed it is estimated to occur approximately 1/1000 migrations. + +[Where problems could occur] + * In the event of a regression the migration monitor may report an inconsistent state. + +[Original Bug Description] + +While live-migrating many instances concurrently, libvirt sometimes return internal error: migration was active, but no RAM info was set: +~~~ +2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set +toml +[[issue]] +id = 10004 +title = "Migration active but no RAM info set during concurrent live migrations" +state = "open" +created_at = "n/a" +closed_at = "n/a" +labels = ["migration", "libvirt", "openstack"] +url = "n/a" +host-os = "n/a" +host-arch = "n/a" +qemu-version = "n/a" +guest-os = "n/a" +guest-arch = "n/a" +description = """During concurrent live migrations, libvirt reports 'migration was active, but no RAM info was set' error intermittently""" +reproduce = """1. Perform live evacuation of compute node +2. Observe failure in approximately 1/1000 migrations with error message +3. Synthetic reproducer available via GDB in comments""" +additional = """Affects large scale clusters with heavy migration activity. Impacts libvirt and OpenStack migration monitoring. Difficult to reproduce consistently.""" +mistranslation = "no" |