1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
|
hypervisor: 0.818
mistranslation: 0.800
graphic: 0.786
virtual: 0.771
TCG: 0.757
user-level: 0.757
ppc: 0.750
vnc: 0.747
peripherals: 0.737
debug: 0.728
risc-v: 0.722
assembly: 0.716
x86: 0.714
device: 0.714
VMM: 0.713
KVM: 0.707
architecture: 0.705
i386: 0.697
register: 0.695
performance: 0.680
arm: 0.678
semantic: 0.675
PID: 0.661
network: 0.649
permissions: 0.638
files: 0.586
socket: 0.570
boot: 0.545
kernel: 0.538
Atomic test-and-set instruction does not work on qemu-user
I try to compile and run PostgreSQL/Greenplum database inside docker container/qemu-aarch64-static:
```
host: CentOS7 x86_64
container: centos:centos7.9.2009 --platform linux/arm64/v8
qemu-user-static: https://github.com/multiarch/qemu-user-static/releases/
```
However, GP/PG's spinlock always gets stuck and reports PANIC errors. It seems its spinlock
has something wrong.
```
https://github.com/greenplum-db/gpdb/blob/master/src/include/storage/s_lock.h
https://github.com/greenplum-db/gpdb/blob/master/src/backend/storage/lmgr/s_lock.c
```
So I extract its spinlock implementation into one test C source file (see attachment file),
and get reprodcued:
```
$ gcc spinlock_qemu.c
$ ./a.out
C -- slock inited, lock value is: 0
parent 139642, child 139645
P -- slock lock before, lock value is: 0
P -- slock locked, lock value is: 1
P -- slock unlock after, lock value is: 0
C -- slock lock before, lock value is: 1
P -- slock lock before, lock value is: 1
C -- slock locked, lock value is: 1
C -- slock unlock after, lock value is: 0
C -- slock lock before, lock value is: 1
P -- slock locked, lock value is: 1
P -- slock unlock after, lock value is: 0
P -- slock lock before, lock value is: 1
C -- slock locked, lock value is: 1
C -- slock unlock after, lock value is: 0
P -- slock locked, lock value is: 1
C -- slock lock before, lock value is: 1
P -- slock unlock after, lock value is: 0
C -- slock locked, lock value is: 1
P -- slock lock before, lock value is: 1
C -- slock unlock after, lock value is: 0
P -- slock locked, lock value is: 1
C -- slock lock before, lock value is: 1
P -- slock unlock after, lock value is: 0
C -- slock locked, lock value is: 1
P -- slock lock before, lock value is: 1
C -- slock unlock after, lock value is: 0
P -- slock locked, lock value is: 1
C -- slock lock before, lock value is: 1
P -- slock unlock after, lock value is: 0
P -- slock lock before, lock value is: 1
spin timeout, lock value is 1 (pid 139642)
spin timeout, lock value is 1 (pid 139645)
spin timeout, lock value is 1 (pid 139645)
spin timeout, lock value is 1 (pid 139642)
spin timeout, lock value is 1 (pid 139645)
spin timeout, lock value is 1 (pid 139642)
...
...
...
```
NOTE: this code always works on PHYSICAL ARM64 server.
Interestingly, the spinlock test works after I change tas() implementation
FROM
__sync_lock_test_and_set(lock, 1);
TO
__sync_val_compare_and_swap(lock, 0, 1);
## gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
====
__sync_lock_test_and_set(lock, 1) disassembly:
```
objdump -S a.out
000000000040073c <tas>:
40073c: d10043ff sub sp, sp, #0x10
400740: f90007e0 str x0, [sp, #8]
400744: f94007e0 ldr x0, [sp, #8]
400748: 52800021 mov w1, #0x1 // #1
40074c: 885f7c02 ldxr w2, [x0]
400750: 88037c01 stxr w3, w1, [x0]
400754: 35ffffc3 cbnz w3, 40074c <tas+0x10>
400758: d5033bbf dmb ish
40075c: 2a0203e0 mov w0, w2
400760: 910043ff add sp, sp, #0x10
400764: d65f03c0 ret
```
====
__sync_val_compare_and_swap(lock, 0, 1); disassembly:
```
objdump -S a.out
000000000040073c <tas>:
40073c: d10043ff sub sp, sp, #0x10
400740: f90007e0 str x0, [sp, #8]
400744: f94007e0 ldr x0, [sp, #8]
400748: 52800021 mov w1, #0x1 // #1
40074c: 885f7c02 ldxr w2, [x0]
400750: 35000062 cbnz w2, 40075c <tas+0x20>
400754: 8803fc01 stlxr w3, w1, [x0]
400758: 35ffffa3 cbnz w3, 40074c <tas+0x10>
40075c: 7100005f cmp w2, #0x0
400760: d5033bbf dmb ish
400764: 2a0203e0 mov w0, w2
400768: 910043ff add sp, sp, #0x10
40076c: d65f03c0 ret
```
The QEMU project is currently moving its bug tracking to another system.
For this we need to know which bugs are still valid and which could be
closed already. Thus we are setting the bug state to "Incomplete" now.
If the bug has already been fixed in the latest upstream version of QEMU,
then please close this ticket as "Fix released".
If it is not fixed yet and you think that this bug report here is still
valid, then you have two options:
1) If you already have an account on gitlab.com, please open a new ticket
for this problem in our new tracker here:
https://gitlab.com/qemu-project/qemu/-/issues
and then close this ticket here on Launchpad (or let it expire auto-
matically after 60 days). Please mention the URL of this bug ticket on
Launchpad in the new ticket on GitLab.
2) If you don't have an account on gitlab.com and don't intend to get
one, but still would like to keep this ticket opened, then please switch
the state back to "New" or "Confirmed" within the next 60 days (other-
wise it will get closed as "Expired"). We will then eventually migrate
the ticket automatically to the new system (but you won't be the reporter
of the bug in the new system and thus you won't get notified on changes
anymore).
Thank you and sorry for the inconvenience.
[Expired for QEMU because there has been no activity for 60 days.]
This is an automated cleanup. This bug report has been moved to QEMU's
new bug tracker on gitlab.com and thus gets marked as 'expired' now.
Please continue with the discussion here:
https://gitlab.com/qemu-project/qemu/-/issues/509
Hi taos! Could you please check whether this has been fixed already in QEMU v6.1.0-rc1 ?
Thanks. Tested, the problem is gone.
|