1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
|
qemu squeeze crashes "BUG: unable to handle kernel NULL pointer dereference at (null)"
my virtual machine server (qemu+libvirt) regularly breaks down with such a record in the logs
I can not even ping the guest, but i can ping host, but can not do something with it (cannot ssh login for example)
And I dont know how to reproduce the problem :(
Mar 15 17:58:04 mainhost kernel: [65866.976982] BUG: unable to handle kernel NULL pointer dereference at (null)
Mar 15 17:58:04 mainhost kernel: [65866.977422] IP: [<ffffffff8100efbe>] 0xffffffff8100efbe
Mar 15 17:58:04 mainhost kernel: [65866.977663] PGD 7387b7067 PUD 81b723067 PMD 0.
Mar 15 17:58:04 mainhost kernel: [65866.977902] Oops: 0000 [#1] SMP.
Mar 15 17:58:04 mainhost kernel: [65866.978128] last sysfs file: /sys/devices/system/cpu/cpu3/topology/thread_siblings
Mar 15 17:58:04 mainhost kernel: [65866.978572] CPU 1.
Mar 15 17:58:04 mainhost kernel: [65866.978577] Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ebtable_nat ebtables coretemp bridge stp llc xt_state
Mar 15 17:58:04 mainhost kernel: [65866.979737].
Mar 15 17:58:04 mainhost kernel: [65866.979959] Pid: 3369, comm: qemu-system-x86 Not tainted 2.6.37-gentoo-r2 #2 Intel S5000VSA/S5000VSA
Mar 15 17:58:04 mainhost kernel: [65866.980085] RIP: 0010:[<ffffffff8100efbe>] [<ffffffff8100efbe>] 0xffffffff8100efbe
Mar 15 17:58:04 mainhost kernel: [65866.980085] RSP: 0018:ffff880738767a48 EFLAGS: 00010246
Mar 15 17:58:04 mainhost kernel: [65866.980085] RAX: 0000000000000000 RBX: fffffffffffff001 RCX: ffff88081cbeb948
Mar 15 17:58:04 mainhost kernel: [65866.980085] RDX: 0000000000000022 RSI: fffffffffffff001 RDI: ffff88081cbeb000
Mar 15 17:58:04 mainhost kernel: [65866.980085] RBP: 0000000000000001 R08: 00000000000fee01 R09: 0000000000000022
Mar 15 17:58:04 mainhost kernel: [65866.980085] R10: 0000008000000000 R11: ffffea0000000000 R12: ffff880818d83490
Mar 15 17:58:04 mainhost kernel: [65866.980085] R13: 00000000155e5000 R14: 0000000000000000 R15: 0000000000000100
Mar 15 17:58:04 mainhost kernel: [65866.980085] FS: 00007f5f25e4e700(0000) GS:ffff88009f680000(0000) knlGS:fffff80001175000
Mar 15 17:58:04 mainhost kernel: [65866.980085] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
Mar 15 17:58:04 mainhost kernel: [65866.980085] CR2: 0000000000000000 CR3: 0000000806be9000 CR4: 00000000000426e0
Mar 15 17:58:04 mainhost kernel: [65866.980085] DR0: 0000000000000045 DR1: 0000000000000000 DR2: 0000000000000000
Mar 15 17:58:04 mainhost kernel: [65866.980085] DR3: 0000000000000005 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 15 17:58:04 mainhost kernel: [65866.980085] Process qemu-system-x86 (pid: 3369, threadinfo ffff880738766000, task ffff8808203ac360)
Mar 15 17:58:04 mainhost kernel: [65866.980085] Stack:
Mar 15 17:58:04 mainhost kernel: [65866.980085] 0000000000000000 ffff8806a30f3ff8 ffff880753980000 ffffffff8100f06f
Mar 15 17:58:04 mainhost kernel: [65866.980085] 0000000000000ff8 ffff8807705d6b40 0000000000000ff8 ffffffff810123f0
Mar 15 17:58:04 mainhost kernel: [65866.980085] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Mar 15 17:58:04 mainhost kernel: [65866.980085] Call Trace:
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100f06f>] ? 0xffffffff8100f06f
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff810123f0>] ? 0xffffffff810123f0
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100f744>] ? 0xffffffff8100f744
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100ffaf>] ? 0xffffffff8100ffaf
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff810011f1>] ? 0xffffffff810011f1
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff810142fc>] ? 0xffffffff810142fc
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100834d>] ? 0xffffffff8100834d
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff81011af6>] ? 0xffffffff81011af6
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100c5a7>] ? 0xffffffff8100c5a7
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff81003a85>] ? 0xffffffff81003a85
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff810e19b0>] ? 0xffffffff810e19b0
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff81078cd8>] ? 0xffffffff81078cd8
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff810e1a39>] ? 0xffffffff810e1a39
Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff810267fb>] ? 0xffffffff810267fb
Mar 15 17:58:04 mainhost kernel: [65866.980085] Code: 8b 47 50 8d 50 01 85 c0 89 57 50 75 07 41 58 e9 32 ff ff ff 5f c3 55 89 d5 53 48 89 f3 48 83 ec 08 e8 d
Mar 15 17:58:04 mainhost kernel: [65866.980085] RIP [<ffffffff8100efbe>] 0xffffffff8100efbe
Mar 15 17:58:04 mainhost kernel: [65866.980085] RSP <ffff880738767a48>
Mar 15 17:58:04 mainhost kernel: [65866.980085] CR2: 0000000000000000
Mar 15 17:58:04 mainhost kernel: [65866.990753] ---[ end trace d147f74976c2654d ]---
Linux mainhost 2.6.37-gentoo-r2 #2 SMP Mon Mar 14 22:53:20 MSK 2011 x86_64 Intel(R) Xeon(R) CPU E5405 @ 2.00GHz GenuineIntel GNU/Linux
app-emulation/qemu-kvm-0.13.0-r2
app-emulation/libvirt-0.8.8-r1
mainhost log # ps ax|grep qemu
2957 ? Sl 12:28 /usr/bin/qemu-system-x86_64 --enable-kvm -S -M pc-0.13 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name dc1 -uuid f090bfc9-1cab-e4e9-51ea-80c9cc775bff -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/dc1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -boot order=c,menu=off -drive file=/dev/vm-storage/dc1,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=13,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:7e:a1:a7,bus=pci.0,addr=0x4 -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:0,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
2982 ? Rl 10:34 /usr/bin/qemu-system-x86_64 --enable-kvm -S -M pc-0.13 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name transarchive -uuid b96a3481-1ad6-9329-387e-a1504a17d0ee -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/transarchive.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -boot order=c,menu=off -drive file=/dev/vm-storage/transarchive,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=13,id=hostnet0,vhost=on,vhostfd=17 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a9:f8:06,bus=pci.0,addr=0x3 -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:3,password -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
On Tue, Mar 15, 2011 at 9:28 PM, Aidar Kamalov
<email address hidden> wrote:
> Mar 15 17:58:04 mainhost kernel: [65866.980085] Call Trace:
> Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100f06f>] ? 0xffffffff8100f06f
> Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff810123f0>] ? 0xffffffff810123f0
> Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100f744>] ? 0xffffffff8100f744
> Mar 15 17:58:04 mainhost kernel: [65866.980085] [<ffffffff8100ffaf>] ? 0xffffffff8100ffaf
Please compile a kernel with debug information so that the call trace
has function names. The option is CONFIG_DEBUG_INFO and can be
reached via "Kernel hacking | Compile the kernel with debug info" in
make menuconfig.
Stefan
I have compiled with CONFIG_DEBUG_INFO, but core is not creating, i think i do all right:
mainhost log # zcat /proc/config.gz |grep CONFIG_DEBUG_INFO
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_REDUCED is not set
mainhost log # cat /etc/security/limits.conf | grep core
# - core - limits the core file size (KB)
* soft core unlimited
mainhost log # ulimit -S -s
8192
mainhost log # ulimit -H -s
unlimited
mainhost log # ulimit -S -s 100000
mainhost log # ulimit -S -s
100000
mainhost log # cat /proc/sys/kernel/core_pattern
core
mainhost log # cat /proc/sys/kernel/core_uses_pid
0
mainhost log # ulimit -c
unlimited
mainhost log # yes & kill -ABRT `jobs -p`
[1] 3527
mainhost log #
[1]+ Aborted (core dumped) yes
On Wed, Mar 16, 2011 at 8:27 PM, Aidar Kamalov
<email address hidden> wrote:
> I have compiled with CONFIG_DEBUG_INFO, but core is not creating, i
> think i do all right:
>
> mainhost log # zcat /proc/config.gz |grep CONFIG_DEBUG_INFO
> CONFIG_DEBUG_INFO=y
> # CONFIG_DEBUG_INFO_REDUCED is not set
This looks fine
> mainhost log # cat /etc/security/limits.conf | grep core
> # - core - limits the core file size (KB)
> * soft core unlimited
> mainhost log # ulimit -S -s
> 8192
> mainhost log # ulimit -H -s
> unlimited
> mainhost log # ulimit -S -s 100000
> mainhost log # ulimit -S -s
> 100000
> mainhost log # cat /proc/sys/kernel/core_pattern
> core
> mainhost log # cat /proc/sys/kernel/core_uses_pid
> 0
> mainhost log # ulimit -c
> unlimited
> mainhost log # yes & kill -ABRT `jobs -p`
> [1] 3527
> mainhost log #
> [1]+ Aborted (core dumped) yes
These are core dump options for userspace processes. You originally
posted a kernel backtrace and that is not affected by core dump
options.
When the issue is triggered again you should see function names in the
"Call Trace:" section of the BUG output.
Stefan
Issue is trigered, but there are nothing changes, i will try to get core
Mar 16 21:50:29 mainhost kernel: [28123.087654] BUG: unable to handle kernel NULL pointer dereference at (null)
Mar 16 21:50:29 mainhost kernel: [28123.088106] IP: [<ffffffff8100fe7d>] 0xffffffff8100fe7d
Mar 16 21:50:29 mainhost kernel: [28123.088331] PGD 81a4f5067 PUD 81d08b067 PMD 0.
Mar 16 21:50:29 mainhost kernel: [28123.088555] Oops: 0000 [#1] SMP.
Mar 16 21:50:29 mainhost kernel: [28123.088776] last sysfs file: /sys/devices/system/cpu/cpu3/topology/thread_siblings
Mar 16 21:50:29 mainhost kernel: [28123.089218] CPU 2.
Mar 16 21:50:29 mainhost kernel: [28123.089223] Modules linked in: i5k_amb coretemp nfs lockd nfs_acl auth_rpcgss sunrpc ebtable_nat ebtables bridge stp llc
Mar 16 21:50:29 mainhost kernel: [28123.090526].
Mar 16 21:50:29 mainhost kernel: [28123.090526] Pid: 2900, comm: qemu-system-x86 Not tainted 2.6.38-gentoo #2 Intel S5000VSA/S5000VSA
Mar 16 21:50:29 mainhost kernel: [28123.090526] RIP: 0010:[<ffffffff8100fe7d>] [<ffffffff8100fe7d>] 0xffffffff8100fe7d
Mar 16 21:50:29 mainhost kernel: [28123.090526] RSP: 0018:ffff88081b9b5a68 EFLAGS: 00010246
Mar 16 21:50:29 mainhost kernel: [28123.090526] RAX: 0000000000000000 RBX: fffffffffffff001 RCX: 00000000000fee01
Mar 16 21:50:29 mainhost kernel: [28123.090526] RDX: 0000000000000022 RSI: fffffffffffff001 RDI: ffff8807be235000
Mar 16 21:50:29 mainhost kernel: [28123.090526] RBP: 0000000000000001 R08: 0000000000000022 R09: 0000000000000040
Mar 16 21:50:29 mainhost kernel: [28123.090526] R10: ffff88081b9b5be8 R11: 0000000000000000 R12: 0000000000000ff8
Mar 16 21:50:29 mainhost kernel: [28123.090526] R13: 0000000001cbf000 R14: 0000000000000013 R15: 0000000000000000
Mar 16 21:50:29 mainhost kernel: [28123.090526] FS: 00007fd821b70700(0000) GS:ffff88009f700000(0000) knlGS:000007fffffdd000
Mar 16 21:50:29 mainhost kernel: [28123.090526] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
Mar 16 21:50:29 mainhost kernel: [28123.090526] CR2: 0000000000000000 CR3: 000000081d245000 CR4: 00000000000426e0
Mar 16 21:50:29 mainhost kernel: [28123.090526] DR0: 0000000000000001 DR1: 0000000000000002 DR2: 0000000000000001
Mar 16 21:50:29 mainhost kernel: [28123.090526] DR3: 000000000000000a DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 16 21:50:29 mainhost kernel: [28123.090526] Process qemu-system-x86 (pid: 2900, threadinfo ffff88081b9b4000, task ffff88081f344fa0)
Mar 16 21:50:29 mainhost kernel: [28123.090526] Stack:
Mar 16 21:50:29 mainhost kernel: [28123.090526] 8000000815be5207 ffff8806bcf93ff8 ffff88081b9d4000 ffffffff8100ff2e
Mar 16 21:50:29 mainhost kernel: [28123.090526] ffffffff8100132a ffff88074b61a460 ffff88081a508000 ffffffff8101016c
Mar 16 21:50:29 mainhost kernel: [28123.090526] 0000000001cbf000 ffffffff81014d6d 000fffff00000001 0000000000001e76
Mar 16 21:50:29 mainhost kernel: [28123.090526] Call Trace:
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff8100ff2e>] ? 0xffffffff8100ff2e
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff8100132a>] ? 0xffffffff8100132a
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff8101016c>] ? 0xffffffff8101016c
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff81014d6d>] ? 0xffffffff81014d6d
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff810106a9>] ? 0xffffffff810106a9
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff810147d0>] ? 0xffffffff810147d0
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff81014c94>] ? 0xffffffff81014c94
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff8102283d>] ? 0xffffffff8102283d
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff810262a1>] ? 0xffffffff810262a1
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff8100d1b6>] ? 0xffffffff8100d1b6
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff81003f86>] ? 0xffffffff81003f86
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff810e0b37>] ? 0xffffffff810e0b37
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff81341d8e>] ? 0xffffffff81341d8e
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff810ed54c>] ? 0xffffffff810ed54c
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff810ed5d5>] ? 0xffffffff810ed5d5
Mar 16 21:50:29 mainhost kernel: [28123.090526] [<ffffffff810287fb>] ? 0xffffffff810287fb
Mar 16 21:50:29 mainhost kernel: [28123.090526] Code: 8b 47 50 8d 50 01 85 c0 89 57 50 75 07 41 58 e9 32 ff ff ff 5e c3 55 89 d5 53 48 89 f3 48 83 ec 08 e8 9
Mar 16 21:50:29 mainhost kernel: [28123.090526] RIP [<ffffffff8100fe7d>] 0xffffffff8100fe7d
Mar 16 21:50:29 mainhost kernel: [28123.090526] RSP <ffff88081b9b5a68>
Mar 16 21:50:29 mainhost kernel: [28123.090526] CR2: 0000000000000000
Mar 16 21:50:29 mainhost kernel: [28123.102127] ---[ end trace 68b482cf7220ceeb ]---
well, i has downgraded to 2.6.33 and system stable for 3 days yet..
system is halted on 2.6.36, 2,6.37 and 2.6.38 kernels
mainhost ~ # uptime
12:38:30 up 3 days, 2:56, 2 users, load average: 0.00, 0.00, 0.04
mainhost ~ # uname -a
Linux mainhost 2.6.33-gentoo-r1 #4 SMP Tue Aug 24 09:53:21 MSD 2010 x86_64 Intel(R) Xeon(R) CPU E5405 @ 2.00GHz GenuineIntel GNU/Linux
Sounds like this was a kernel bug ... you should report those to the right kernel mailing lists instead. Anyway, closing this ticket now since there hasn't been any activity since a long time.
|