diff options
Diffstat (limited to 'results/classifier/105/other/1777777')
| -rw-r--r-- | results/classifier/105/other/1777777 | 346 |
1 files changed, 346 insertions, 0 deletions
diff --git a/results/classifier/105/other/1777777 b/results/classifier/105/other/1777777 new file mode 100644 index 000000000..bde193f98 --- /dev/null +++ b/results/classifier/105/other/1777777 @@ -0,0 +1,346 @@ +other: 0.809 +mistranslation: 0.766 +semantic: 0.752 +device: 0.736 +instruction: 0.710 +graphic: 0.706 +boot: 0.701 +KVM: 0.691 +assembly: 0.686 +vnc: 0.661 +network: 0.616 +socket: 0.527 + +arm9 clock pending (SP804) + +Hello all, + +I'm using the versatilepb board and the timer Interrupt Mask Status register (offset 0x14 of the SP804) does not seem to be working properly on the latest qemu-2.12. I tried on the 2.5 (i believe this is the mainstream version that comes with Linux) and it works perfectly. + +What happens is that the pending bit does not seem to be set in some scenarios. In my case, I see the timer value decreasing to 0 and then being reset to the reload value and neither does the interrupt is triggered nor the pending bit is set. + +I believe this is a matter of timing since in the "long" run the system eventually catches up (after a few microseconds). + +Thank you + +Do you have a test case image you can provide that reproduces the problem? + + +Hello, + +Let me explain a bit how to reproduce the problem. I'm developing a RTOS with qemu (and boards,etc) - https://sourceforge.net/projects/rtospharos/. The first version used ARM926 with the versatilepb board and I used the qemu 2.5 version. + +In my test (to check that the RTOS is working correctly) I produced a loop that is always reading the current time ("pharosClockGet") and checking that it is always increasing. This works perfectly on qemu 2.5. When qemu 2.12 was released I updated to the new version to add support for the raspberry pi 3. + +Here is a piece of the test: + + +void aperiodicThread0_0() +{ + uint64_t microseconds; + uint64_t previousMicroseconds = 0; + uint32_t i; + + for(i = 0; i < NUMBER_REPETITIONS; i++) + { + microseconds = pharosClockGetSinceBoot(); + if(previousMicroseconds > microseconds) + { + print("ERROR! Clock got lower\r\n"); + break; + } + } +} + + +I'll show you the "internals" of Pharos next (this is file clock.c): + + +uint64_t pharosIClockGetSinceBoot(void) +{ + /* microseconds since last clock tick*/ + uint32_t microseconds; + + /* number of clock ticks since boot */ + ClockTick clockTicks; + + /* >> interrupts are disabled when we get here << */ + + /* read the number of microseconds on the counter */ + clockTicks = pharosVClockTicks; + + /* read the timer value */ + microseconds = pharosCpuClockReadCounter(); + + /* if the clock interrupt is pending we might have read the microseconds before or after the clock tick occurred (but not processed since interrupts are disabled) */ + if(pharosCpuClockIsrIsPending() == TRUE) + { + /* interrupt is pending, so read again the microseconds and subtract 2 clock ticks */ + microseconds = (2U * pharosIMicrosecondsPerClockTick()) - pharosCpuClockReadCounter(); + } + else + { + /* the timer counts backwards so we must subtract to get the number of microseconds since last tick */ + microseconds = pharosIMicrosecondsPerClockTick() - microseconds; + } + + /* calculate the result using the last us read */ + return (uint64_t) ((pharosIMicrosecondsPerClockTick() * clockTicks) + microseconds); +} + + +And the part that is hw dependent (for ARM926) - this is in file ClockSp804.h: + +INLINE bool armSp804TimerInterruptIsPending(const ptrArmSp804Timer c) +{ + /* get the first bit of the interrupt status (ignore other bits) to check if the interrupt is pending */ + return (c->timerInterruptMaskStatus & 0x1U) == 1U ? TRUE : FALSE; +} + + +I hope the code is easy enough to read. +Thank you + + +Sorry, I should have been clearer. Please can you provide a test case binary and QEMU command line that reproduces the problem (including what the expected output is and what the actual output is)? + +Mark this bug as Incomplete as we're still waiting for the test case binary and command line to reproduce it. + + +[Expired for QEMU because there has been no activity for 60 days.] + +Sorry to have taken some time to answer. + +Created the binary with the test. + +Here is the output when it fails (tested in Qemu 2.12): + +Checked 0 times if the time is always increasing +Checked 100000 times if the time is always increasing +Checked 200000 times if the time is always increasing +Checked 300000 times if the time is always increasing +Checked 400000 times if the time is always increasing +Checked 500000 times if the time is always increasing +Checked 600000 times if the time is always increasing +Checked 700000 times if the time is always increasing +Error: the time read is smaller than the previous time read + + +And here with the latest Qemu (4.1.0): + +Checked 0 times if the time is always increasing +Checked 100000 times if the time is always increasing +Error: the time read is smaller than the previous time read + + + +The error can occur basically at any time. But when tested with Qemu 2.5 it goes well: + +Checked 0 times if the time is always increasing +Checked 100000 times if the time is always increasing +Checked 200000 times if the time is always increasing +Checked 300000 times if the time is always increasing +Checked 400000 times if the time is always increasing +Checked 500000 times if the time is always increasing +Checked 600000 times if the time is always increasing +Checked 700000 times if the time is always increasing +Checked 800000 times if the time is always increasing +Checked 900000 times if the time is always increasing +Checked 1000000 times if the time is always increasing +Checked 1100000 times if the time is always increasing +Checked 1200000 times if the time is always increasing +Checked 1300000 times if the time is always increasing +Checked 1400000 times if the time is always increasing +Checked 1500000 times if the time is always increasing +Checked 1600000 times if the time is always increasing +Checked 1700000 times if the time is always increasing +Checked 1800000 times if the time is always increasing +Checked 1900000 times if the time is always increasing +Checked 2000000 times if the time is always increasing +Checked 2100000 times if the time is always increasing +Checked 2200000 times if the time is always increasing +Checked 2300000 times if the time is always increasing +Checked 2400000 times if the time is always increasing +Checked 2500000 times if the time is always increasing +Checked 2600000 times if the time is always increasing +Checked 2700000 times if the time is always increasing +Checked 2800000 times if the time is always increasing +Checked 2900000 times if the time is always increasing +. +. +. +Checked 9500000 times if the time is always increasing +Checked 9600000 times if the time is always increasing +Checked 9700000 times if the time is always increasing +Checked 9800000 times if the time is always increasing +Checked 9900000 times if the time is always increasing +Example finished + + + +Just to give some background on the test itself, the main.c file contains this: + +void helloWorld0() +{ + uint64_t microseconds; + uint64_t previousMicroseconds = 0; + uint32_t i; + + for(i = 0; i < NUMBER_REPETITIONS; i++) + { + microseconds = pharosClockGetSinceBoot(); + + if(microseconds < previousMicroseconds) + { + printk("Error: the time read is smaller than the previous time read\r\n"); + pharosThreadSuspendSelf(); + } + + previousMicroseconds = microseconds; + + /* only print 100 times the message */ + if(i % (NUMBER_REPETITIONS / 100) == 0) + { + printk("Checked %d times if the time is always increasing\r\n" , i); + } + } + + printk("Example finished\r\n"); + pharosThreadSuspendSelf(); +} + + +And where I suspect the code is reading the wrong hardware level is here: + +bool pharosCpuClockIsrIsPending(void) +{ + /* apparently QEMU has a bug when it comes to reading the SP804 interrupt status. It seems that we have to read + it multiple times in order for it to give the correct result since it does not update the interrupt status + of the SP804 when it should. Qemu 2.5 did not have this problem but at least Qemu 2.12, 3.1 and 4.1.0 have this + problem. So we just create a work-around for it. */ +#if 0 + + /* number of times to read it. */ +#define N 100 + + uint32_t i; + + /* read the interrupts status a lot of times to wait for QEMU to update the status */ + for(i = 0; i < N; i++) + { + /* if there is an interrupt */ + if(armSp804TimerInterruptIsPending(TIMER1_BASE_ADDRESS) == TRUE) + { + /* return right away */ + return TRUE; + } + } + + /* read the status a LOT of times, there is no interrupt pending */ + return FALSE; + + /* On a real hardware this problem should not occur so we only have to read it once */ +#else + /* read the SP804 timer interrupt pending info */ + return armSp804TimerInterruptIsPending(TIMER1_BASE_ADDRESS); +#endif +} + +that is, the armSp804TimerInterruptIsPending is not correctly returning if the timer1 has a pending interrupt (even though I check the timer load value and it was reset just before and no interrupt was raised - interrupts disabled) + + +Forgot to state the qemu argument, it's this one: + +~/cross-compiler/qemu/2.5/bin/qemu-system-arm -M versatilepb -m 128M -nographic -kernel myApplication + + +Thanks for the repro case. Preliminary analysis: I added some tracepoints to the sp804 code, and you're right that the value of the interrupt status register isn't always correct: + +A normal attempt to read the clock looks like this: + +Taking exception 2 [SVC] +...from EL0 to EL1 +...with ESR 0x11/0x4600000b +AArch32 mode switch from svc to sys PC 0x12010 +AArch32 mode switch from sys to svc PC 0x12018 +32696@1569513406.450378:sp804_read addr 0x00000004 value 0x00000681 +32696@1569513406.450384:sp804_read addr 0x00000014 value 0x00000000 +Exception return from AArch32 svc to usr PC 0x10ff0 + +Sometimes we might read the clock when it's exactly got down to 0 (this is more likely on QEMU than on real h/w for internal reasons of our implementation): +Taking exception 2 [SVC] +...from EL0 to EL1 +...with ESR 0x11/0x4600000b +AArch32 mode switch from svc to sys PC 0x12010 +AArch32 mode switch from sys to svc PC 0x12018 +32696@1569513406.452273:sp804_read addr 0x00000004 value 0x00000000 +32696@1569513406.452279:sp804_read addr 0x00000014 value 0x00000000 +Exception return from AArch32 svc to usr PC 0x10ff0 + +A correct handling of the rollover looks like this (we read the counter, which is rolled over, and the interrupt-status, which is 1, which causes us to reread the counter; once we're done the IRQ handler itself runs): + +4003@1569514474.944756:sp804_read addr 0x00000004 value 0x0000c29d +4001@1569514474.944761:sp804_arm_timer_update level 1 +4003@1569514474.944797:sp804_read addr 0x00000014 value 0x00000001 +4003@1569514474.944828:sp804_read addr 0x00000004 value 0x0000c255 +Taking exception 5 [IRQ] +...from EL1 to EL1 +...with ESR 0x11/0x4600000b +AArch32 mode switch from irq to svc PC 0x2a5e4 +4003@1569514474.944943:sp804_read addr 0x00000014 value 0x00000001 +4003@1569514474.944957:sp804_read addr 0x00000034 value 0x00000000 +4003@1569514474.944962:sp804_read addr 0x00000014 value 0x00000001 +4003@1569514474.944965:sp804_write addr 0x0000000c value 0x00000000 +4003@1569514474.944966:sp804_arm_timer_update level 0 +4003@1569514474.944969:sp804_read addr 0x00000034 value 0x00000000 +AArch32 mode switch from svc to irq PC 0x2a718 +Exception return from AArch32 irq to svc PC 0x2a3bc +Exception return from AArch32 svc to usr PC 0x10ff0 + +But sometimes we get this, where we read the rolled-over counter value but the interrupt-status register is still 0: + +Taking exception 2 [SVC] +...from EL0 to EL1 +...with ESR 0x11/0x4600000b +AArch32 mode switch from svc to sys PC 0x12010 +AArch32 mode switch from sys to svc PC 0x12018 +4003@1569514475.794690:sp804_read addr 0x00000004 value 0x0000c2df +4003@1569514475.794698:sp804_read addr 0x00000014 value 0x00000000 +4001@1569514475.794703:sp804_arm_timer_update level 1 +Exception return from AArch32 svc to usr PC 0x10ff0 +Taking exception 5 [IRQ] +...from EL0 to EL1 +...with ESR 0x11/0x4600000b +AArch32 mode switch from irq to svc PC 0x2a5e4 +4003@1569514475.794768:sp804_read addr 0x00000014 value 0x00000001 +4003@1569514475.794937:sp804_read addr 0x00000034 value 0x00000000 +4003@1569514475.794944:sp804_read addr 0x00000014 value 0x00000001 +4003@1569514475.794947:sp804_write addr 0x0000000c value 0x00000000 +4003@1569514475.794949:sp804_arm_timer_update level 0 +4003@1569514475.794952:sp804_read addr 0x00000034 value 0x00000000 +AArch32 mode switch from svc to irq PC 0x2a718 +Exception return from AArch32 irq to usr PC 0x10ff0 + + + +This happens because the sp804 uses a QEMU timer abstraction "ptimer". The ptimer updates its internal state when a raw QEMU timer expires and calls the ptimer_tick() function. From this point on, a guest read of the counter value will get the rolled-over value, because the sp804 implements this as a simple ptimer_get_count(). However, the ptimer doesn't immediately call the sp804's arm_timer_tick() function (which is where we update the interrupt-status flag and arrange for an IRQ to be delivered) -- it just schedules that to happen later via a QEMU "bottom half handler". Unfortunately it's possible for the guest CPU to run between when the ptimer's ptimer_tick() happens and when the bottom-half-handler is triggered, which means that the guest can see this incorrectly out-of-sync state from the sp804 device. + +I'm not currently sure how best to fix this. + + +Thank you for checking this issue. + +I looked at the ptimer code, like you said. Just one question: isn't this used by other hw as well? +Maybe this problem is more general... + +I also tried (basically) the same example on a aarch64 (raspberry pi3), and I don't find any problems there. Maybe could be helpful to look how this problem is solved there. + +Just trying to help + +I sent out an initial RFC patchset which fixes this (it's just an RFC because it only converts this one device to the new ptimer API, and we should do a proper conversion of all devices); it seems to make the test case in this bug work correctly: +https://<email address hidden>/ + + +Fix has been included in QEMU v4.2: +https://git.qemu.org/?p=qemu.git;a=commitdiff;h=5a65f7b5f4907ca70cb6 + |