1 files changed, 346 insertions, 0 deletions
diff --git a/results/classifier/105/other/1777777 b/results/classifier/105/other/1777777
new file mode 100644
index 000000000..bde193f98
--- /dev/null
+++ b/results/classifier/105/other/1777777
@@ -0,0 +1,346 @@
+other: 0.809
+mistranslation: 0.766
+semantic: 0.752
+device: 0.736
+instruction: 0.710
+graphic: 0.706
+boot: 0.701
+KVM: 0.691
+assembly: 0.686
+vnc: 0.661
+network: 0.616
+socket: 0.527
+
+arm9 clock pending (SP804)
+
+Hello all,
+
+I'm using the versatilepb board and the timer Interrupt Mask Status register (offset 0x14 of the SP804) does not seem to be working properly on the latest qemu-2.12. I tried on the 2.5 (i believe this is the mainstream version that comes with Linux) and it works perfectly.
+
+What happens is that the pending bit does not seem to be set in some scenarios. In my case, I see the timer value decreasing to 0 and then being reset to the reload value and neither does the interrupt is triggered nor the pending bit is set.
+
+I believe this is a matter of timing since in the "long" run the system eventually catches up (after a few microseconds).
+
+Thank you
+
+Do you have a test case image you can provide that reproduces the problem?
+
+
+Hello,
+
+Let me explain a bit how to reproduce the problem. I'm developing a RTOS with qemu (and boards,etc) - https://sourceforge.net/projects/rtospharos/. The first version used ARM926 with the versatilepb board and I used the qemu 2.5 version.
+
+In my test (to check that the RTOS is working correctly) I produced a loop that is always reading the current time ("pharosClockGet") and checking that it is always increasing. This works perfectly on qemu 2.5. When qemu 2.12 was released I updated to the new version to add support for the raspberry pi 3. 
+
+Here is a piece of the test:
+
+
+void aperiodicThread0_0()
+{
+    uint64_t microseconds;
+    uint64_t previousMicroseconds = 0;
+    uint32_t i;
+
+    for(i = 0; i < NUMBER_REPETITIONS; i++)
+    {
+        microseconds = pharosClockGetSinceBoot();
+        if(previousMicroseconds > microseconds) 
+        {
+            print("ERROR! Clock got lower\r\n");
+            break;
+        }
+    }
+}
+
+
+I'll show you the "internals" of Pharos next (this is file clock.c):
+
+
+uint64_t pharosIClockGetSinceBoot(void)
+{
+    /* microseconds since last clock tick*/
+    uint32_t microseconds;
+
+    /* number of clock ticks since boot */
+    ClockTick clockTicks;
+
+    /* >> interrupts are disabled when we get here << */
+
+    /* read the number of microseconds on the counter */
+    clockTicks = pharosVClockTicks;
+
+    /* read the timer value */
+    microseconds = pharosCpuClockReadCounter();
+
+    /* if the clock interrupt is pending we might have read the microseconds before or after the clock tick occurred (but not processed since interrupts are disabled) */
+    if(pharosCpuClockIsrIsPending() == TRUE)
+    {
+        /* interrupt is pending, so read again the microseconds and subtract 2 clock ticks */
+        microseconds = (2U * pharosIMicrosecondsPerClockTick()) - pharosCpuClockReadCounter();
+    }
+    else
+    {
+        /* the timer counts backwards so we must subtract to get the number of microseconds since last tick */
+        microseconds = pharosIMicrosecondsPerClockTick() - microseconds;
+    }
+
+    /* calculate the result using the last us read */
+    return (uint64_t) ((pharosIMicrosecondsPerClockTick() * clockTicks) + microseconds);
+}
+
+
+And the part that is hw dependent (for ARM926) - this is in file ClockSp804.h: 
+
+INLINE bool armSp804TimerInterruptIsPending(const ptrArmSp804Timer c)
+{
+        /* get the first bit of the interrupt status (ignore other bits) to check if the interrupt is pending */
+        return (c->timerInterruptMaskStatus & 0x1U) == 1U ? TRUE : FALSE;
+}
+
+
+I hope the code is easy enough to read.
+Thank you
+
+
+Sorry, I should have been clearer. Please can you provide a test case binary and QEMU command line that reproduces the problem (including what the expected output is and what the actual output is)? 
+
+Mark this bug as Incomplete as we're still waiting for the test case binary and command line to reproduce it.
+
+
+[Expired for QEMU because there has been no activity for 60 days.]
+
+Sorry to have taken some time to answer.
+
+Created the binary with the test.
+
+Here is the output when it fails (tested in Qemu 2.12):
+
+Checked 0 times if the time is always increasing
+Checked 100000 times if the time is always increasing
+Checked 200000 times if the time is always increasing
+Checked 300000 times if the time is always increasing
+Checked 400000 times if the time is always increasing
+Checked 500000 times if the time is always increasing
+Checked 600000 times if the time is always increasing
+Checked 700000 times if the time is always increasing
+Error: the time read is smaller than the previous time read
+
+
+And here with the latest Qemu (4.1.0):
+
+Checked 0 times if the time is always increasing
+Checked 100000 times if the time is always increasing
+Error: the time read is smaller than the previous time read
+
+
+
+The error can occur basically at any time. But when tested with Qemu 2.5 it goes well:
+
+Checked 0 times if the time is always increasing
+Checked 100000 times if the time is always increasing
+Checked 200000 times if the time is always increasing
+Checked 300000 times if the time is always increasing
+Checked 400000 times if the time is always increasing
+Checked 500000 times if the time is always increasing
+Checked 600000 times if the time is always increasing
+Checked 700000 times if the time is always increasing
+Checked 800000 times if the time is always increasing
+Checked 900000 times if the time is always increasing
+Checked 1000000 times if the time is always increasing
+Checked 1100000 times if the time is always increasing
+Checked 1200000 times if the time is always increasing
+Checked 1300000 times if the time is always increasing
+Checked 1400000 times if the time is always increasing
+Checked 1500000 times if the time is always increasing
+Checked 1600000 times if the time is always increasing
+Checked 1700000 times if the time is always increasing
+Checked 1800000 times if the time is always increasing
+Checked 1900000 times if the time is always increasing
+Checked 2000000 times if the time is always increasing
+Checked 2100000 times if the time is always increasing
+Checked 2200000 times if the time is always increasing
+Checked 2300000 times if the time is always increasing
+Checked 2400000 times if the time is always increasing
+Checked 2500000 times if the time is always increasing
+Checked 2600000 times if the time is always increasing
+Checked 2700000 times if the time is always increasing
+Checked 2800000 times if the time is always increasing
+Checked 2900000 times if the time is always increasing
+.
+.
+.
+Checked 9500000 times if the time is always increasing
+Checked 9600000 times if the time is always increasing
+Checked 9700000 times if the time is always increasing
+Checked 9800000 times if the time is always increasing
+Checked 9900000 times if the time is always increasing
+Example finished
+
+
+
+Just to give some background on the test itself, the main.c file contains this:
+
+void helloWorld0()
+{
+    uint64_t microseconds;
+    uint64_t previousMicroseconds = 0;
+    uint32_t i;
+
+    for(i = 0; i < NUMBER_REPETITIONS; i++)
+    {
+        microseconds = pharosClockGetSinceBoot();
+
+        if(microseconds < previousMicroseconds)
+        {
+            printk("Error: the time read is smaller than the previous time read\r\n");
+            pharosThreadSuspendSelf();
+        }
+
+        previousMicroseconds = microseconds;
+
+        /* only print 100 times the message */
+        if(i % (NUMBER_REPETITIONS / 100) == 0)
+        {
+            printk("Checked %d times if the time is always increasing\r\n" , i);
+        }
+    }
+
+    printk("Example finished\r\n");
+    pharosThreadSuspendSelf();
+}
+
+
+And where I suspect the code is reading the wrong hardware level is here:
+
+bool pharosCpuClockIsrIsPending(void)
+{
+    /* apparently QEMU has a bug when it comes to reading the SP804 interrupt status. It seems that we have to read
+      it multiple times in order for it to give the correct result since it does not update the interrupt status 
+      of the SP804 when it should. Qemu 2.5 did not have this problem but at least Qemu 2.12, 3.1 and 4.1.0 have this
+      problem. So we just create a work-around for it. */
+#if 0
+
+    /* number of times to read it. */
+#define N 100
+
+    uint32_t i;
+
+    /* read the interrupts status a lot of times to wait for QEMU to update the status */
+    for(i = 0; i < N; i++)
+    {
+        /* if there is an interrupt */
+        if(armSp804TimerInterruptIsPending(TIMER1_BASE_ADDRESS) == TRUE)
+        {
+            /* return right away */
+            return TRUE;
+        }
+    }
+
+    /* read the status a LOT of times, there is no interrupt pending */
+    return FALSE;
+
+    /* On a real hardware this problem should not occur so we only have to read it once */
+#else
+    /* read the SP804 timer interrupt pending info */
+    return armSp804TimerInterruptIsPending(TIMER1_BASE_ADDRESS);
+#endif
+}
+
+that is, the armSp804TimerInterruptIsPending is not correctly returning if the timer1 has a pending interrupt (even though I check the timer load value and it was reset just before and no interrupt was raised - interrupts disabled)
+
+
+Forgot to state the qemu argument, it's this one:
+
+~/cross-compiler/qemu/2.5/bin/qemu-system-arm -M versatilepb -m 128M -nographic -kernel myApplication
+
+
+Thanks for the repro case. Preliminary analysis: I added some tracepoints to the sp804 code, and you're right that the value of the interrupt status register isn't always correct:
+
+A normal attempt to read the clock looks like this:
+
+Taking exception 2 [SVC]
+...from EL0 to EL1
+...with ESR 0x11/0x4600000b
+AArch32 mode switch from svc to sys PC 0x12010
+AArch32 mode switch from sys to svc PC 0x12018
+32696@1569513406.450378:sp804_read addr 0x00000004 value 0x00000681
+32696@1569513406.450384:sp804_read addr 0x00000014 value 0x00000000
+Exception return from AArch32 svc to usr PC 0x10ff0
+
+Sometimes we might read the clock when it's exactly got down to 0 (this is more likely on QEMU than on real h/w for internal reasons of our implementation):
+Taking exception 2 [SVC]
+...from EL0 to EL1
+...with ESR 0x11/0x4600000b
+AArch32 mode switch from svc to sys PC 0x12010
+AArch32 mode switch from sys to svc PC 0x12018
+32696@1569513406.452273:sp804_read addr 0x00000004 value 0x00000000
+32696@1569513406.452279:sp804_read addr 0x00000014 value 0x00000000
+Exception return from AArch32 svc to usr PC 0x10ff0
+
+A correct handling of the rollover looks like this (we read the counter, which is rolled over, and the interrupt-status, which is 1, which causes us to reread the counter; once we're done the IRQ handler itself runs):
+
+4003@1569514474.944756:sp804_read addr 0x00000004 value 0x0000c29d
+4001@1569514474.944761:sp804_arm_timer_update level 1
+4003@1569514474.944797:sp804_read addr 0x00000014 value 0x00000001
+4003@1569514474.944828:sp804_read addr 0x00000004 value 0x0000c255
+Taking exception 5 [IRQ]
+...from EL1 to EL1
+...with ESR 0x11/0x4600000b
+AArch32 mode switch from irq to svc PC 0x2a5e4
+4003@1569514474.944943:sp804_read addr 0x00000014 value 0x00000001
+4003@1569514474.944957:sp804_read addr 0x00000034 value 0x00000000
+4003@1569514474.944962:sp804_read addr 0x00000014 value 0x00000001
+4003@1569514474.944965:sp804_write addr 0x0000000c value 0x00000000
+4003@1569514474.944966:sp804_arm_timer_update level 0
+4003@1569514474.944969:sp804_read addr 0x00000034 value 0x00000000
+AArch32 mode switch from svc to irq PC 0x2a718
+Exception return from AArch32 irq to svc PC 0x2a3bc
+Exception return from AArch32 svc to usr PC 0x10ff0
+
+But sometimes we get this, where we read the rolled-over counter value but the interrupt-status register is still 0:
+
+Taking exception 2 [SVC]
+...from EL0 to EL1
+...with ESR 0x11/0x4600000b
+AArch32 mode switch from svc to sys PC 0x12010
+AArch32 mode switch from sys to svc PC 0x12018
+4003@1569514475.794690:sp804_read addr 0x00000004 value 0x0000c2df
+4003@1569514475.794698:sp804_read addr 0x00000014 value 0x00000000
+4001@1569514475.794703:sp804_arm_timer_update level 1
+Exception return from AArch32 svc to usr PC 0x10ff0
+Taking exception 5 [IRQ]
+...from EL0 to EL1
+...with ESR 0x11/0x4600000b
+AArch32 mode switch from irq to svc PC 0x2a5e4
+4003@1569514475.794768:sp804_read addr 0x00000014 value 0x00000001
+4003@1569514475.794937:sp804_read addr 0x00000034 value 0x00000000
+4003@1569514475.794944:sp804_read addr 0x00000014 value 0x00000001
+4003@1569514475.794947:sp804_write addr 0x0000000c value 0x00000000
+4003@1569514475.794949:sp804_arm_timer_update level 0
+4003@1569514475.794952:sp804_read addr 0x00000034 value 0x00000000
+AArch32 mode switch from svc to irq PC 0x2a718
+Exception return from AArch32 irq to usr PC 0x10ff0
+
+
+
+This happens because the sp804 uses a QEMU timer abstraction "ptimer". The ptimer updates its internal state when a raw QEMU timer expires and calls the ptimer_tick() function. From this point on, a guest read of the counter value will get the rolled-over value, because the sp804 implements this as a simple ptimer_get_count(). However, the ptimer doesn't immediately call the sp804's arm_timer_tick() function (which is where we update the interrupt-status flag and arrange for an IRQ to be delivered) -- it just schedules that to happen later via a QEMU "bottom half handler". Unfortunately it's possible for the guest CPU to run between when the ptimer's ptimer_tick() happens and when the bottom-half-handler is triggered, which means that the guest can see this incorrectly out-of-sync state from the sp804 device.
+
+I'm not currently sure how best to fix this.
+
+
+Thank you for checking this issue.
+
+I looked at the ptimer code, like you said. Just one question: isn't this used by other hw as well?
+Maybe this problem is more general...
+
+I also tried (basically) the same example on a aarch64 (raspberry pi3), and I don't find any problems there. Maybe could be helpful to look how this problem is solved there.
+
+Just trying to help
+
+I sent out an initial RFC patchset which fixes this (it's just an RFC because it only converts this one device to the new ptimer API, and we should do a proper conversion of all devices); it seems to make the test case in this bug work correctly:
+https://<email address hidden>/
+
+
+Fix has been included in QEMU v4.2:
+https://git.qemu.org/?p=qemu.git;a=commitdiff;h=5a65f7b5f4907ca70cb6
+