summary refs log tree commit diff stats
path: root/results/scraper/launchpad/1755912
blob: 8f350ccc88598931739c273041f765d56d8a6cf2 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
qemu-system-x86_64 crashed with SIGABRT when using option -vga qxl

When using qemu-system-x86_64 with the option -vga qxl, it crashes. The easiest way to crash it is by trying to change the guest's resolution. However, the system may randomly crash too, not happening only when changing resolution. Here is the terminal output of one of these random crashes:

--------

$ qemu-system-x86_64 -hda /dev/sdb -m 2048 -enable-kvm -cpu host -vga qxl -nodefaults -netdev user,id=hostnet0 -device virtio-net-pci,id=net0,netdev=hostnet0
WARNING: Image format was not specified for '/dev/sdb' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.

(process:21313): Spice-WARNING **: 16:01:45.759: display-channel.c:2431:display_channel_validate_surface: canvas address is 0x7f8eb948ab18 for 0 (and is NULL)


(process:21313): Spice-WARNING **: 16:01:45.759: display-channel.c:2432:display_channel_validate_surface: failed on 0

(process:21313): Spice-CRITICAL **: 16:01:45.759: display-channel.c:2035:display_channel_update: condition `display_channel_validate_surface(display, surface_id)' failed
Abortado (imagem do núcleo gravada)

--------

I was running QEMU as a normal user which is on the groups kvm and disk. Initially I supposed the problem was because I was running QEMU as root, but as a normal user this happens too.

I have tested with guests with different Ubuntu version: 18.04, 17.10 and 16.04. It is happening with them all.

ProblemType: Crash
DistroRelease: Ubuntu 18.04
Package: qemu-system-x86 1:2.11+dfsg-1ubuntu4
ProcVersionSignature: Ubuntu 4.15.0-10.11-generic 4.15.3
Uname: Linux 4.15.0-10-generic x86_64
ApportVersion: 2.20.8-0ubuntu10
Architecture: amd64
CurrentDesktop: XFCE
Date: Wed Mar 14 17:13:52 2018
ExecutablePath: /usr/bin/qemu-system-x86_64
InstallationDate: Installed on 2017-06-13 (273 days ago)
InstallationMedia: Xubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
KvmCmdLine: COMMAND         STAT  EUID  RUID   PID  PPID %CPU COMMAND
MachineType: LENOVO 80UG
ProcCmdline: qemu-system-x86_64 -hda /dev/sdb -smp cpus=2 -m 512 -enable-kvm -cpu host -vga qxl -nodefaults -netdev user,id=hostnet0 -device virtio-net-pci,id=net0,netdev=hostnet0
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-10-generic.efi.signed root=UUID=6b4ae5c0-c78c-49a6-a1ba-029192618a7a ro quiet
Signal: 6
SourcePackage: qemu
StacktraceTop:
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
Title: qemu-system-x86_64 crashed with SIGABRT
UpgradeStatus: Upgraded to bionic on 2017-10-20 (145 days ago)
UserGroups: adm bluetooth cdrom dialout dip disk kvm libvirt lpadmin netdev plugdev sambashare sudo
dmi.bios.date: 07/10/2017
dmi.bios.vendor: LENOVO
dmi.bios.version: 0XCN43WW
dmi.board.asset.tag: NO Asset Tag
dmi.board.name: Toronto 4A2
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40679 WIN
dmi.chassis.asset.tag: NO Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo ideapad 310-14ISK
dmi.modalias: dmi:bvnLENOVO:bvr0XCN43WW:bd07/10/2017:svnLENOVO:pn80UG:pvrLenovoideapad310-14ISK:rvnLENOVO:rnToronto4A2:rvrSDK0J40679WIN:cvnLENOVO:ct10:cvrLenovoideapad310-14ISK:
dmi.product.family: IDEAPAD
dmi.product.name: 80UG
dmi.product.version: Lenovo ideapad 310-14ISK
dmi.sys.vendor: LENOVO



StacktraceTop:
 spice_logv (log_domain=0x7fb001524195 "Spice", args=0x7fafbf9fe600, format=0x7fb001525015 "condition `%s' failed", function=0x7fb001527ef0 <__func__.47520> "display_channel_update", strloc=0x7fb001527c0f "display-channel.c:2035", log_level=G_LOG_LEVEL_CRITICAL) at log.c:183
 spice_log (log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, strloc=strloc@entry=0x7fb001527c0f "display-channel.c:2035", function=function@entry=0x7fb001527ef0 <__func__.47520> "display_channel_update", format=format@entry=0x7fb001525015 "condition `%s' failed") at log.c:196
 display_channel_update (display=0x56421590aa30, surface_id=0, area=area@entry=0x56421590ee1c, clear_dirty=1, qxl_dirty_rects=qxl_dirty_rects@entry=0x7fafbf9fe770, num_dirty_rects=num_dirty_rects@entry=0x7fafbf9fe76c) at display-channel.c:2035
 handle_dev_update_async (opaque=0x56421590ebe0, payload=0x56421590ee10) at red-worker.c:428
 dispatcher_handle_single_read (dispatcher=0x56421590e080) at dispatcher.c:284








Subscribed Christian Ehrhardt, who might have an idea.

Hmm,
I have no good idea unfortunately.
I tried it a few times (18.04 desktop guest 8 resolution changes) - showed no issue for me.
Is this depending on the type of guests that you run?

I drive ti through libvirt, which adds quite some variables in the new format.
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
You might experiment if some of them set would mitigate your issue.

Looking at the warnings - they are from display_channel_validate_surface but they are "safe" which means they check, detect it is null and then skips.
But maybe later on something else crashes on the same canvas being Null maybe?

Are the warnings like:
Spice-WARNING **: 16:01:45.759: display-channel.c:2431:display_channel_validate_surface: canvas address is 0x7f8eb948ab18 for 0 (and is NULL)
appearing right before the crash or when you start?

The crash itself seems to be:
display_channel_update -> display_channel_validate_surface (which emits the warnings) -> spice_warning -> spice_log -> spice_logv
This crashes if >= a certain log level - the warning above triggers it.
So the question is why is the canvas Null?

2429     if (!display->priv->surfaces[surface_id].context.canvas) {                   
2430         spice_warning("canvas address is %p for %d (and is NULL)\n",             
2431                    &(display->priv->surfaces[surface_id].context.canvas), surface_id);
2432         spice_warning("failed on %d", surface_id);                               
2433         return FALSE;                                                            
2434     }  

handle_dev_update_async gets that value indirectly via
  RedWorker *worker = opaque;
  ...
  display_channel_update(worker->display_channel,
So the update that kills the pointer might be anywhere in between in this async paths.
I'm not a subject matter expert on this async UI updating :-/

If this really affects all releases way back including the latest we should try to build you the latest from source and if it affects this as well open the bug against upstream as well. As there we need a fix/discussion first then.
Can you compile qemu from source for yourself for this check or do you need help with that.

I was able to build it from source. In the first try I was unable to test because it hadn't the spice protocol enabled.

The interface QEMU is using is different, as it has a menu bar with "Machine" and "View" with some options, but I could test and I could reproduce the crash. To clarify, I have recorded the VMs crashing.

Please note I have the notebook screen and a external monitor, so the resolution is a bit strange.

With the command: ./qemu-system-x86_64 -hda ~/Downloads/xubuntu-bionic-desktop-amd64-2018-04-22.iso -m 1536 -vga qxl -enable-kvm -cpu host -smp cpus=2

https://mega.nz/#!98ZiHY5b!ZOaNjb1OaZVj0V80GRjkqafOAL2UinVlAEiTP9aazdk

With the command: ./qemu-system-x86_64 -hda ~/Downloads/xubuntu-bionic-desktop-amd64-2018-04-22.iso -m 1536 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x3 -enable-kvm -cpu host -smp cpus=2

https://mega.nz/#!lwRDRA7I!no-6S6cWxmRf8Q8cjSWfN269PM9DISjLmN7QM6LYBC4

The examples are with Live ISOs, Using -cdrom (the most correct option) instead of -hda still have the same crashes. I can reproduce with a installed VM and with a physical system started in QEMU.

Additional effects I've seen are the guests freezing instead of crashing and a significant RAM memory peak when starting the VM using libvirt. If high amounts of memory are being used in the moment the VM is started, approximately 128 MB of memory goes to SWAP that is never freed. The only way to freed it is to umount the SWAP partition (I tested until SWAP had 658 MB of memory).

What I meant is that with a Bionic host, the guests crash regardless of what the guest is. For now I've tested only with Ubuntu and its derivatives, from 12.04 to 18.04 and they consistently have the issue. I'm downloading CentOS but the download is slow, so it will take some time. I will add the CentOS guest test results after I test it.

The download of CentOS 7 was finished and I downloaded Fedora Workstation 27 later. With both as guests the VM still crash, so it's not a issue exclusive to Ubuntu guests.

The crash may happen after 8 resolution changes (CentOS) or in the first one (Fedora 27) or if the system is running and suddenly crashes (Xubuntu 17.10 in a external HDD).

Thanks Leonardo, with that confirmed:
- not dependent on the guest distribution
- affecting latest upstream
- good logs on the crash

The videos are not needed but nice to proove your case (just as my pre-analysis of the code path is nice but likely not useful to a developer that regularly works on that code sections).

Overall this should really be ready for upstreams attention, I added a Qemu task which will auto mirror this to the Mailing list.

The bug traces so far had no private information, so I opened up the state to be visible to everyone.

QEMU from git apparently is fixed, but Ubuntu's version is still problematic.

Using an Xubuntu 18.04 guest, it's possible to reproduce the crash using:

while true ; do xrandr --output Virtual-0 --mode 640x480 ; sleep 1 ; xrandr --output Virtual-0 --mode 1280x720 ; sleep 1 ; xrandr --output Virtual-0 --mode 1920x1080 ; sleep 1 ; done

In less than 20 seconds the guest crash with:

(process:16447): Spice-CRITICAL **: 15:34:52.047: display-channel.c:2035:display_channel_update: condition `display_channel_validate_surface(display, surface_id)' failed
Abortado (imagem do núcleo gravada)

Very interesting, Still not triggering for me :-/
Could you check if the PPA in [1] (with qemu 2.12 planned for Cosmic) already fixes it for you?

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3306/+packages

Link is better as https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3306

Unfortunately it did not work, the error is still the same although it used the GTK interface this time. The command I used was:

$ qemu-system-x86_64 -vga qxl -enable-kvm -cpu host -smp cores=2,threads=2 -m 2048 -cdrom xubuntu-18.04-desktop-amd64.iso

I have noticed Ubuntu's QEMU and this QEMU don't have the OpenGL option enabled, may this be related to the issue?

Yeah we switched to gtk.
With OpenGL which do you mean - the virtgl based support or some other config option?

Since it still fails to reproduce for me, but you already have a self build qemu that is good.
Could you based on this build 2.12 from source as well and confirm that it fails.
If it does you could bisect from 2.12 to master what fixed it so that we can consider that patch.
Otherwise if 2.12 from source in your own build works lets take a look at the build options that you mentioned.

I meant the option --enable-opengl and, for old versions, --enable-gtk-gl. I know it is required to use virtual machines using Intel GVT-g with dma-buf, and is a option strangely absent from QEMU configuration from Ubuntu's build. Without it, virgl fails too, making virt-manager have an option that does not work (under Spice Display, OpenGL option does not work).

The 2.12 build does crash when tested. That while loop is pretty efficient to trigger it. The git version can keep running it for hours straight with no problem, while the problematic versions crash in seconds.

As QEMU configuration is a result mixed between packages found automatically and those manually set by me, here is the configure command and its results from my QEMU build: https://paste.ubuntu.com/p/z9vnFdTnkD/

Please note that I had no need to build other targets for my use, so the configuration is much smaller than the one used by Ubuntu's QEMU.

The bisect is done and this is the result: 

5bd5c27c7d284d01477c5cc022ce22438c46bf9f is the first new commit
commit 5bd5c27c7d284d01477c5cc022ce22438c46bf9f
Author: Gerd Hoffmann <email address hidden>
Date:   Fri Apr 27 13:55:28 2018 +0200

    qxl: fix local renderer crash
    
    Make sure we only ask the spice local renderer for display updates in
    case we have a valid primary surface.  Without that spice is confused
    and throws errors in case a display update request (triggered by
    screendump for example) happens in parallel to a mode switch and hits
    the race window where the old primary surface is gone and the new isn't
    establisted yet.
    
    Cc: <email address hidden>
    Fixes: https://bugzilla.redhat.com//show_bug.cgi?id=1567733
    Signed-off-by: Gerd Hoffmann <email address hidden>
    Reviewed-by: Marc-André Lureau <email address hidden>
    Message-id: <email address hidden>

:040000 040000 ed86f864314483660b3c2abe045361ae7c98a5dc f0d55b3e98dd0864d6686fba052d83ae5545d007 M	hw


Nice, thanks Leonardo for the Bisect - lets call upstream Fixed Committed then (as there is no 2.13 released yet).

For the separate gl discussion feel free to subscribe/chime in on bug 1657409

Currently 2.12 is on its way into Cosmic.
I'll add and test that patch afterwards, it looks small and safe to me.

This bug was fixed in the package qemu - 1:2.12+dfsg-3ubuntu3

---------------
qemu (1:2.12+dfsg-3ubuntu3) cosmic; urgency=medium

  * d/p/lp-1755912-qxl-fix-local-renderer-crash.patch: Fix an issue triggered
    by migrations with UI frontends or frequent guest resolution changes
    (LP: #1755912)

 -- Christian Ehrhardt <email address hidden>  Thu, 19 Jul 2018 08:26:52 +0200

Working on Bionic SRu prep in https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3367/+packages

Hello Leonardo, or anyone else affected,

Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in advance!

Thank you for the effort to backport this bug fix, I have tested the proposed version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.5) and the crash when changing resolution is no longer happening. Unfortunately, after some time (about 10 minutes, much longer than without this fix), the guest freezes.

I reported that in https://bugs.launchpad.net/qemu/+bug/1787070

I'll change the tag to verification-done-bionic, as this particular issue was corrected. The other issue still exists in QEMU upstream.

Note: The code change already passed the general regression checks on the identical content against a PPA (Also on the weekend prior to the full maturing period I'll have another automated run on proposed).

Running the Bionic ISO like:
$ qemu-system-x86_64 -cpu host -smp cores=4,threads=2 -boot d -m 2048 -enable-kvm -vga qxl -vnc :21 -cdrom ubuntu-18.04-desktop-amd64.iso

Attaching like:
$ vncviewer FullColor=1 AutoSelect=0 10.245.168.42:5921
(alternatives on tigervnc)
Well for me it had "-k de" as well :-)

Then boot into the "try Ubuntu" live CD mode.
There opened a terminal to loop on xrandr.

Running the loop of above in that guest to crash it after a while.



Upgrade to proposed:
$ sudo apt install qemu-system-x86=1:2.11+dfsg-1ubuntu7.5
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  samba vde2 qemu-block-extra sgabios ovmf
The following packages will be upgraded:
  qemu-system-x86
1 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
Need to get 5.168 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 qemu-system-x86 amd64 1:2.11+dfsg-1ubuntu7.5 [5.168 kB]
Fetched 5.168 kB in 1s (7.666 kB/s)          
(Reading database ... 127990 files and directories currently installed.)
Preparing to unpack .../qemu-system-x86_1%3a2.11+dfsg-1ubuntu7.5_amd64.deb ...
Unpacking qemu-system-x86 (1:2.11+dfsg-1ubuntu7.5) over (1:2.11+dfsg-1ubuntu7.4) ...
Setting up qemu-system-x86 (1:2.11+dfsg-1ubuntu7.5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


Running the loop again with the same setup  - no crash in 15 minutes - assume that means good now.
I'd be glad about someone else checking the result as well, best someone formerly affected by it.
(and having a tick in the eye for seeing my right screen change sizes and flicker all the time)

Setting verified

Thanks Leonardo for your check as well!
I agree that this particular issue here is fixed as we hoped.
The other one with the freeze did not occur for me, you might open another bug if you have any pointers how we could go on on this.

Arr reading is hard today, you had the other bug open already ... grml :-)

Never the less - This bug here is fixed in the proposed version and for now that is the important part.

Subscribed there as well now to stay on top of it if upstream gets to a conclusion.

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates.  Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report.  In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.5

---------------
qemu (1:2.11+dfsg-1ubuntu7.5) bionic; urgency=medium

  [Christian Ehrhardt]
  * d/p/lp-1755912-qxl-fix-local-renderer-crash.patch: Fix an issue triggered
    by migrations with UI frontends or frequent guest resolution changes
    (LP: #1755912)

  [ Murilo Opsfelder Araujo ]
  * d/p/ubuntu/target-ppc-extend-eieio-for-POWER9.patch: Backport to
    extend eieio for POWER9 emulation (LP: #1787408).

 -- Christian Ehrhardt <email address hidden>  Tue, 21 Aug 2018 11:25:45 +0200