results/classifier/zero-shot/118/all/1774677


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195

debug: 0.963
peripherals: 0.961
risc-v: 0.959
virtual: 0.957
semantic: 0.955
mistranslation: 0.954
VMM: 0.953
TCG: 0.950
user-level: 0.949
performance: 0.947
arm: 0.947
vnc: 0.947
boot: 0.936
device: 0.936
ppc: 0.936
assembly: 0.935
socket: 0.935
x86: 0.934
kernel: 0.932
graphic: 0.931
architecture: 0.922
register: 0.918
files: 0.917
PID: 0.916
KVM: 0.914
network: 0.910
permissions: 0.908
hypervisor: 0.904
i386: 0.880

-icount increases boot time by >10x

When I specify the -icount option, some guest operations such as booting a Linux kernel take more than 10 times longer than otherwise.  For example, the following will boot Aboriginal Linux to the login prompt about 6 seconds on my system (using TCG, not KVM):

wget http://landley.net/aboriginal/downloads/old/binaries/1.4.5/system-image-i686.tar.gz
gunzip <system-image-i686.tar.gz | tar xfv -
cd system-image-i686
sh run-emulator.sh

If I replace the last line with

QEMU_EXTRA="-icount shift=auto" sh run-emulator.sh

booting to the login prompt takes about 1 minute 20 seconds.

I have tried different values for "shift" other than the "auto" used above, but have not been able to find one that gives reasonable performance.  Specifying "sleep=off" also did not help.

During the slow boots, qemu appears to spend most of its time sleeping, not using the host CPU.

I see this with multiple versions of qemu, including current git sources (c181ddaa176856b3cd2dfd12bbcf25fa9c884a97), and on multiple host OSes, including Debian 9 on x86_64.

Yes, I can reproduce this. Also, interestingly, with the -icount version when I typed 'exit' to quit the guest I got this abort:

qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7fcd0aae636e
EAX=c88053a4 EBX=00000069 ECX=c00211a4 EDX=00000000
ESI=c7812180 EDI=c78126f0 EBP=0000000f ESP=c7a0de44
EIP=c11ada30 EFL=00000287 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
CS =0060 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =007b 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
FS =0000 00000000 00000000 00000000
GS =0033 080a130c ffffffff 00dff300 DPL=3 DS   [-WA]
LDT=0000 00000000 00000000 00008200 DPL=0 LDT
TR =0080 c12fbe60 0000206b 00008900 DPL=0 TSS32-avl
GDT=     c12f6000 000000ff
IDT=     ffffc000 000007ff
CR0=8005003b CR2=09d40ffc CR3=07a08000 CR4=00000690
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=c8800000 CCD=c8805200 CCO=SUBL    
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=ec00000000000000 4006
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
Aborted (core dumped)


...ah, that's the bug fixed by this patch: https://patchwork.ozlabs.org/patch/940058/


The stalls in the test kernel seem to happen at and after the guest prints

tsc: Refined TSC clocksource calibration: 1000.022 MHz
clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0xe6a2bd9634, max_idle_ns: 440795211s
hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive
clocksource: Switched to clocksource tsc

so I suspect a bad interaction between what icount is doing and in particular our emulation of the x86 timestamp counter and what Linux does with it.


A couple of comments...  First, the problem is not limited to Linux guests.  In fact, I originally ran across it with a NetBSD guest, but then reproduced it with a Linux guest for the bug report, because in my experience, qemu bug reports involving non-Linux guests tend to be ignored.

Second, the qemu man page says that "With sleep=on|off, the virtual time will jump to the next timer deadline instantly whenever the virtual cpu goes to sleep mode".  Although the text is confusing (see bug 1774412), to me it suggests that when the mode in case is selected, qemu should execute the guest as fast as possible and only provide it with an illusion of the passage of time, and there should be no reason for qemu itself to ever sleep.  Yet, regardless of whether I specify "sleep=on" or "sleep=off", qemu sleeps.


I bisected this using the following test case (after the setup shown in the original bug report, and replacing "e1000" by "ne2k_pci" in the run-emulator.sh script to avoid a panic in the e1000 driver with some qemu versions):

QEMU_EXTRA="-icount shift=3,sleep=off" sh run-emulator.sh

The bisection identified the following commit:

281b2201e4e18d5b9a26e1e8d81b62b5581a13be is the first bad commit
commit 281b2201e4e18d5b9a26e1e8d81b62b5581a13be
Author: Pavel Dovgalyuk <email address hidden>
Date:   Thu Mar 10 14:56:03 2016 +0300

    icount: remove obsolete warp call

    qemu_clock_warp call in qemu_tcg_wait_io_event function is not needed
    anymore, because it is called in every iteration of main_loop_wait.

    Reviewed-by: Paolo Bonzini <email address hidden>

    Signed-off-by: Pavel Dovgalyuk <email address hidden>
    Message-Id: <20160310115603.4812.67559.stgit@PASHA-ISP>
    Signed-off-by: Paolo Bonzini <email address hidden>

With revision 281b2201e4e18d5b9a26e1e8d81b62b5581a13be, the test case takes 141 seconds to boot.  With the previous revision, 33577b47c64435fcc2a1bc01c7e82534256f1fc3, it takes 1.7 seconds.


I can reproduce it. But there is one interesting thing.
When I replace "-nographic" with "-serial stdio" everything is ok,
icount works normally.

But "-nographic" runs are very slow in icount mode.
It seems that GUI produces some events that force main loop to iterate.

Does reverting the commit fix it? It was just a cleanup I think.

Yes, I tried to revert (with some updates to the current code base) and the bug was fixed:

diff --git a/cpus.c b/cpus.c
index 181ce33..e2fc972 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1172,6 +1172,13 @@ static void qemu_tcg_rr_wait_io_event(CPUState *cpu)
 {
     while (all_cpu_threads_idle()) {
         stop_tcg_kick_timer();
+
+        qemu_mutex_unlock_iothread();
+        replay_mutex_lock();
+        qemu_mutex_lock_iothread();
+        qemu_start_warp_timer();
+        replay_mutex_unlock();
+
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 

I tested Pavel's patch, applying it to master (c447afd5783b9237fa51b7a85777007d8d568bfc), but I'm afraid it only made things worse - qemu has now been booting the test kernel for 30 minutes but the boot has still not completed.  The last console messages printed were:

piix 0000:00:01.1: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xc100-0xc107
    ide1: BM-DMA at 0xc108-0xc10f

Running strace -p on the qemu process shows it calling ppoll once per second:

ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
ppoll([{fd=0, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 5, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)


We figured out that there are races or something.
I run QEMU under Ubuntu in VirtualBox (which has 1 CPU) and it works normally.
On the real machine with multiple cores QEMU stalls with this patch.

Looks like the bug was fixed by this commit:

commit 013aabdc665e4256b38d8875a1a7b5e030ba98f1
Author: Clement Deschamps <email address hidden>
Date:   Sun Oct 21 16:21:03 2018 +0200

    icount: fix deadlock when all cpus are sleeping

    When all cpus are sleeping (e.g in WFI), to avoid a deadlock
    in the main_loop, wake it up in order to start the warp timer.

    Signed-off-by: Clement Deschamps <email address hidden>
    Message-Id: <email address hidden>
    Signed-off-by: Paolo Bonzini <email address hidden>