diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 07:27:52 +0000 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 07:27:52 +0000 |
| commit | d0c85e36e4de67af628d54e9ab577cc3fad7796a (patch) | |
| tree | f8f784b0f04343b90516a338d6df81df3a85dfa2 /results/classifier/gemma3:12b/kernel/1570 | |
| parent | 7f4364274750eb8cb39a3e7493132fca1c01232e (diff) | |
| download | emulator-bug-study-d0c85e36e4de67af628d54e9ab577cc3fad7796a.tar.gz emulator-bug-study-d0c85e36e4de67af628d54e9ab577cc3fad7796a.zip | |
add deepseek and gemma results
Diffstat (limited to 'results/classifier/gemma3:12b/kernel/1570')
| -rw-r--r-- | results/classifier/gemma3:12b/kernel/1570 | 64 |
1 files changed, 64 insertions, 0 deletions
diff --git a/results/classifier/gemma3:12b/kernel/1570 b/results/classifier/gemma3:12b/kernel/1570 new file mode 100644 index 00000000..c4463183 --- /dev/null +++ b/results/classifier/gemma3:12b/kernel/1570 @@ -0,0 +1,64 @@ + +Incorrect memory handling when booting redox +Description of problem: +During the boot of redox, I regularly get one of two errors when reading the HPET at base address `0xfed00000`: +- Incorrect translation from virtual address `0xffff8000fed00108` to random physical addresses, e.g. `0xfec00108` +- Invalid read at addr 0x0, size 8, region 'hpet', reason: invalid size (min:4 max:4) +Steps to reproduce: +1. Build the server version of the redox OS as per [the instructions](https://doc.redox-os.org/book/ch02-05-building-redox.html). +2. Run the qemu command line with multiple CPUs. The more CPUs the easier it is to reproduce. +3. The problem will manifest itself as a divide by zero error. See the corresponding [redox bug report](https://gitlab.redox-os.org/redox-os/kernel/-/issues/116). +Additional information: +The best evidence I have is a debug line I added to qemu before [the memory_region_dispatch_read line](https://gitlab.com/qemu-project/qemu/-/blob/master/accel/tcg/cputlb.c#L1375): + +``` +if ((mr_offset & 0x1ff) == 0x108) fprintf(stderr, "cputlb io_readx cpu %d addr=%llx mr_offset=%llx mr=%p mr->addr=%llx\n", current_cpu->cpu_index, addr, mr_offset, mr, mr->addr); +r = memory_region_dispatch_read(mr, mr_offset, &val, op, full->attrs); +``` + +That logs: + +``` +cputlb io_readx cpu 0 addr=ffff8000fed00108 mr_offset=108 mr=0x7fefb60d5720 mr->addr=fec00000 +``` + +The expected physical address is `0xfed00000` instead of `0xfec00000`. + +A more extensive log is this one: +``` +55027@1680283224.671665:memory_region_ops_read cpu 5 mr 0x7f9950890130 addr 0xfed000f0 value 0x949707cc size 4 name 'hpet' <- ok +55027@1680283224.671681:memory_region_ops_read cpu 5 mr 0x7f9950890130 addr 0xfed000f4 value 0x0 size 4 name 'hpet' <- ok +tlb_set_page_full: vaddr=0000000000474000 paddr=0x000000000536f000 prot=5 idx=1 +... +tlb_flush_by_mmuidx_async_work: mmu_idx:0xffff +tlb_flush_by_mmuidx_async_work: mmu_idx:0xffff +tlb_flush_by_mmuidx_async_work: mmu_idx:0xffff +tlb_flush_by_mmuidx_async_work: mmu_idx:0xffff +... +55027@1680283224.671951:memory_region_ops_read cpu 5 mr 0x7f9950882930 addr 0xfec00108 value 0x0 size 4 name 'ioapic' <- wrong +55027@1680283224.671958:memory_region_ops_read cpu 5 mr 0x7f9950882930 addr 0xfec0010c value 0x0 size 4 name 'ioapic' +55027@1680283224.671967:memory_region_ops_write cpu 2 mr 0x7f994d808d30 addr 0xcf8 value 0x8000fa80 size 4 name 'pci-conf-idx' +55027@1680283224.671986:memory_region_ops_read cpu 2 mr 0x7f994d808e40 addr 0xcfc value 0x80a805 size 4 name 'pci-conf-data' +55027@1680283224.672001:memory_region_ops_read cpu 5 mr 0x7f9950882930 addr 0xfec00000 value 0x0 size 4 name 'ioapic' <- wrong +55027@1680283224.672010:memory_region_ops_read cpu 5 mr 0x7f9950882930 addr 0xfec00004 value 0x0 size 4 name 'ioapic' +``` + +Some observations +- ~I seem to be the only one having this issue. Perhaps because I am the only one developing on MacOS. Maybe it's because I'm running an older intel mac.~. I managed to reproduce this on a Asus vivobook running linux +- The redox OS [reads the HPET](https://gitlab.redox-os.org/redox-os/kernel/-/blob/master/src/arch/x86_64/time.rs#L11) at addresses `0xf4`, `0x108`, `0x00` in that order. If I change the order to `0x00`, `0xf4`, `0x108`, the problem goes away. +- Even if I work around the problem by changing the order of the reads, the OS still randomly crashes. This could be related, but I can only speculate on that right now. +- Increasing qemu debug logging tends to push the problem to the 4vs8 size problem instead of the incorrect address one. The more logging, the more difficult it is to reproduce. +- I tried to bisect the issue and found I could only reproduce it after qemu version 5.2. However, the mac build broke during this process so I could not find the causal commit. Between 5.1 and 5.2 the performance is greatly increased though and I suspect whatever changed there caused the issue. +- I can't reproduce the problem with -smp 1 +- I have seen qemu segfault occasionally, but I didn't look further into it and I don't know if it's related to this issue. +- I have attempted to rule out a bug in redox. I am fairly certain nothing strange is going on there, but I can't say for sure. +- When I trigger the incorrect address bug, I mostly get a base address of `0xfec00000` which is the IO APIC. However, I do occasionally see other addresses too +- `info tlb` at the time of the fault shows + ``` + ffff8000fd3e6000: 00000000fd3e6000 X--DA---W + ffff8000fd3e7000: 00000000fd3e7000 X--DA---W + ffff8000fed00000: 00000000fed00000 X--DAC--W + ffff8000fee00000: 00000000fee00000 X--DA---W + fffffd8000000000: 0000000001e32000 XG-DA---W + fffffd8000001000: 0000000001e36000 XG-DA---W + ``` |