summary refs log tree commit diff stats
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-05-29 17:10:08 +0200
committerChristian Krinitsin <mail@krinitsin.com>2025-05-29 17:10:08 +0200
commitdbbaa64f16cef5a2b32056a67433116dab84ab81 (patch)
treefcd4509cb4ae6ce059dfe4850fee213e1a12aee3
parentad77852392240639b9db7b18f8566bd458a20ade (diff)
downloademulator-bug-study-dbbaa64f16cef5a2b32056a67433116dab84ab81.tar.gz
emulator-bug-study-dbbaa64f16cef5a2b32056a67433116dab84ab81.zip
first version of categories in the classifier
-rwxr-xr-xclassification/main.py19
-rw-r--r--classification/test_mails/mail_other_1 (renamed from classification/mail)0
-rw-r--r--classification/test_mails/mail_other_283
-rw-r--r--classification/test_mails/mail_other_324
-rw-r--r--classification/test_mails/mail_semantic_121
-rw-r--r--classification/test_mails/mail_semantic_215
6 files changed, 156 insertions, 6 deletions
diff --git a/classification/main.py b/classification/main.py
index 04f2d8c4..2a6f6d9a 100755
--- a/classification/main.py
+++ b/classification/main.py
@@ -1,10 +1,17 @@
 from transformers import pipeline
+from os import listdir, path
 
+directory : str = "./test_mails"
 classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
-with open("test", "r") as file:
-    sequence_to_classify = file.read()
-candidate_labels = ['semantic bug', 'no semantic bug']
-result = classifier(sequence_to_classify, candidate_labels, multi_label=False)
 
-print(result['labels'])
-print(result['scores'])
+for name in listdir(directory):
+    with open(path.join(directory, name), "r") as file:
+        sequence_to_classify = file.read()
+
+    candidate_labels = ['semantic', 'other', 'mistranslation', 'instruction']
+    result = classifier(sequence_to_classify, candidate_labels, multi_label=True)
+
+    print(name)
+    for label, score in zip(result["labels"], result["scores"]):
+        print(f"{label}: {score:.3f}")
+    print("\n")
diff --git a/classification/mail b/classification/test_mails/mail_other_1
index f4a85532..f4a85532 100644
--- a/classification/mail
+++ b/classification/test_mails/mail_other_1
diff --git a/classification/test_mails/mail_other_2 b/classification/test_mails/mail_other_2
new file mode 100644
index 00000000..df6aceba
--- /dev/null
+++ b/classification/test_mails/mail_other_2
@@ -0,0 +1,83 @@
+qemu-aarch64-static segfaults running ldconfig.real (amd64 host)
+[ Impact ]
+
+ * QEMU crashes when running (emulating) ldconfig in a Ubuntu 22.04 arm64 guest
+
+ * This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on Ubuntu 24.04+, running on a amd64 host.
+
+ * When running docker containers with Ubuntu 22.04 in them, emulating arm64 with qemu-aarch64-static, invocations of ldconfig (actually ldconfig.real) segfault, leading to problems when loading shared libraries.
+
+[ Test Plan ]
+
+ * Reproducer is very easy:
+
+$ sudo snap install docker
+docker 27.5.1 from Canonical** installed
+$ docker run -ti --platform linux/arm64/v8 ubuntu:22.04
+Unable to find image 'ubuntu:22.04' locally
+22.04: Pulling from library/ubuntu
+0d1c17d4e593: Pull complete
+Digest: sha256:ed1544e454989078f5dec1bfdabd8c5cc9c48e0705d07b678ab6ae3fb61952d2
+Status: Downloaded newer image for ubuntu:22.04
+
+# Execute ldconfig.real inside the arm64 guest.
+# This should not crash after the fix!
+root@ad80af5378dc:/# /sbin/ldconfig.real
+qemu: uncaught target signal 11 (Segmentation fault) - core dumped
+Segmentation fault (core dumped)
+
+[ Where problems could occur ]
+
+ * This changes the alignment of sections in the ELF binary via QEMUs elfloader, if something goes wrong with this change, it could lead to all kind of crashes (segfault) of any emulated binaries.
+
+[ Other Info ]
+
+ * Upstream bug: https://gitlab.com/qemu-project/qemu/-/issues/1913
+ * Upstream fix: https://gitlab.com/qemu-project/qemu/-/commit/4b7b20a3
+   - Fix dependency (needed for QEMU < 9.20): https://gitlab.com/qemu-project/qemu/-/commit/c81d1faf
+
+--- original bug report ---
+
+This affects the qemu-user-static 1:8.2.2+ds-0ubuntu1 package on Ubuntu 24.04, running on a amd64 host.
+
+When running docker containers with Ubuntu 22.04 in them, emulating arm64 with qemu-aarch64-static, invocations of ldconfig (actually ldconfig.real) segfault. For example:
+
+$ docker run -ti --platform linux/arm64/v8 ubuntu:22.04
+root@8861ff640a1c:/# /sbin/ldconfig.real
+Segmentation fault
+
+If you copy the ldconfig.real binary to the host, and run it directly via qemu-aarch64-static:
+
+$ gdb --args qemu-aarch64-static ./ldconfig.real
+GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
+Copyright (C) 2024 Free Software Foundation, Inc.
+License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+Type "show copying" and "show warranty" for details.
+This GDB was configured as "x86_64-linux-gnu".
+Type "show configuration" for configuration details.
+For bug reporting instructions, please see:
+<https://www.gnu.org/software/gdb/bugs/>.
+Find the GDB manual and other documentation resources online at:
+    <http://www.gnu.org/software/gdb/documentation/>.
+
+For help, type "help".
+Type "apropos word" to search for commands related to "word"...
+Reading symbols from qemu-aarch64-static...
+Reading symbols from /home/dim/.cache/debuginfod_client/86579812b213be0964189499f62f176bea817bf2/debuginfo...
+(gdb) r
+Starting program: /usr/bin/qemu-aarch64-static ./ldconfig.real
+[Thread debugging using libthread_db enabled]
+Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
+[New Thread 0x7ffff76006c0 (LWP 28378)]
+
+Thread 1 "qemu-aarch64-st" received signal SIGSEGV, Segmentation fault.
+0x00007fffe801645b in ?? ()
+(gdb) disassemble
+No function contains program counter for selected frame.
+
+It looks like this is a known qemu regression after v8.1.1:
+https://gitlab.com/qemu-project/qemu/-/issues/1913
+
+Downgrading the package to qemu-user-static_8.0.4+dfsg-1ubuntu3_amd64.deb fixes the segfault.
diff --git a/classification/test_mails/mail_other_3 b/classification/test_mails/mail_other_3
new file mode 100644
index 00000000..504ddc48
--- /dev/null
+++ b/classification/test_mails/mail_other_3
@@ -0,0 +1,24 @@
+
+This fix is fine for me...at least from SDM, HTT depends on topology and
+it should exist when user sets "-smp 4".
+
+I haven't found any other thread :-).
+
+By the way, just curious, in what cases do you need to disbale the HT
+flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
+enough?
+
+As for the “-ht” behavior, I'm also unsure whether this should be fixed
+or not - one possible consideration is whether “-ht” would be useful.
+
+This fix is fine for me...at least from SDM, HTT depends on topology and
+it should exist when user sets "-smp 4".
+
+I haven't found any other thread :-).
+
+By the way, just curious, in what cases do you need to disbale the HT
+flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
+enough?
+
+As for the “-ht” behavior, I'm also unsure whether this should be fixed
+or not - one possible consideration is whether “-ht” would be useful.
diff --git a/classification/test_mails/mail_semantic_1 b/classification/test_mails/mail_semantic_1
new file mode 100644
index 00000000..af6a2480
--- /dev/null
+++ b/classification/test_mails/mail_semantic_1
@@ -0,0 +1,21 @@
+AArch64: ISV is set to 1 in ESR_EL2 when taking a data abort with post-indexed instructions
+
+I think that I have a Qemu bug in my hands, but, I could still be missing something. Consider the following instruction:
+0x0000000000000000:  C3 44 00 B8    str   w3, [x6], #4
+
+notice the last #4, I think this is what we would call a post-indexed instruction (falls into the category of instructions with writeback). As I understand it, those instructions should not have ISV=1 in ESR_EL2 when faulting.
+
+Here is the relevant part of the manual:
+
+For other faults reported in ESR_EL2, ISV is 0 except for the following stage 2 aborts:
+• AArch64 loads and stores of a single general-purpose register (including the register specified with 0b11111, including those with Acquire/Release semantics, but excluding Load Exclusive or Store Exclusive and excluding those with writeback).
+
+
+However, I can see that Qemu sets ISV to 1 here. The ARM hardware that I tested gave me a value of ISV=0 for similar instructions.
+
+Another example of instruction: 0x00000000000002f8:  01 1C 40 38    ldrb  w1, [x0, #1]!"""
+reproduce = """1. Run some hypervisor in EL2
+2. Create a guest running at EL1 that executes one of the mentioned instructions (and make the instruction fault by writing to some unmapped page in SLP)
+3. Observe the value of ESR_EL2 on data abort
+
+Unfortunately, I cannot provide an image to reproduce this (the software is not open-source). But, I would be happy to help test a patch.
diff --git a/classification/test_mails/mail_semantic_2 b/classification/test_mails/mail_semantic_2
new file mode 100644
index 00000000..4c78171d
--- /dev/null
+++ b/classification/test_mails/mail_semantic_2
@@ -0,0 +1,15 @@
+x86 BLSMSK semantic bug
+description = """The result of instruction BLSMSK is different with from the CPU. The value of CF is different."""
+reproduce = """1. Compile this code
+void main() {
+    asm("mov rax, 0x65b2e276ad27c67");
+    asm("mov rbx, 0x62f34955226b2b5d");
+    asm("blsmsk eax, ebx");
+}
+
+2. Execute and compare the result with the CPU.
+    - CPU
+        - CF = 0
+    - QEMU
+        - CF = 1"""
+additional = """This bug is discovered by research conducted by KAIST SoftSec."""