results/scraper/launchpad/1719339


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

serial8250: too much work for irq3

It's know issue and sometimes mentioned since 2007. But it seems not fixed.

http://lists.gnu.org/archive/html/qemu-devel/2008-02/msg00140.html
https://bugzilla.redhat.com/show_bug.cgi?id=986761
http://old-list-archives.xenproject.org/archives/html/xen-devel/2009-02/msg00696.html

I don't think fixes like increases PASS_LIMIT (/drivers/tty/serial/8250.c) or remove this annoying message (https://patchwork.kernel.org/patch/3920801/) is real fix. Some fix was proposed by H. Peter Anvin  https://lkml.org/lkml/2008/2/7/485.

Can reproduce on Debian Strech host (Qemu 1:2.8+dfsg-6+deb9u2), Ubuntu 16.04.2 LTS (Qemu 1:2.5+dfsg-5ubuntu10.15) also tried to use master branch (QEMU emulator version 2.10.50 (v2.10.0-766-ga43415ebfd-dirty)) if we write a lot of message into console (dmesg or dd if=/dev/zero of=/dev/ttyS1).

/usr/local/bin/qemu-system-x86_64 -name guest=ultra1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-27-ultra1/master-key.aes -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Client,ds=on,acpi=on,ss=on,ht=on,tm=on,pbe=on,dtes64=on,monitor=on,ds_cpl=on,vmx=on,smx=on,est=on,tm2=on,xtpr=on,pdcm=on,osxsave=on,tsc_adjust=on,clflushopt=on,pdpe1gb=on -m 4096 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid 4537ca29-73b2-40c3-9b43-666de182ba5f -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-27-ultra1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x8.0x7 -drive file=/home/dzagorui/csr/csr_disk.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=26,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:a9:4c:86,bus=pci.0,addr=0x3 -chardev socket,id=charserial0,host=127.0.0.1,port=4000,telnet,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charserial1,host=127.0.0.1,port=4001,telnet,server,nowait -device isa-serial,chardev=charserial1,id=serial1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 -msg timestamp=on

Use simpler setup for reproducing. 
Was used only qemu-system-x86_64 (without using high-level wrappers and managers of virtual machines: libvirt, virsh, virt-install, virt-manager etc..). My setup with two consoles:

/usr/local/bin/qemu-system-x86_64 -cpu host -enable-kvm -m 256 -smp 4 -kernel /home/dzagorui//bzImage -append 'root=/dev/ram0 loglevel=9 rw console=ttyS0' -initrd /home/dzagorui/initrd.cpio -display none -chardev socket,id=charserial0,host=127.0.0.1,port=4002,telnet,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charserial1,host=127.0.0.1,port=4003,telnet,server,nowait -device isa-serial,chardev=charserial1,id=serial1

I noticed one thing, that -smp parameter affects this issue. When -smp 1 i can't reproduce this issue at all, when -smp 2 i can produce this issue only in second console (ttyS1), when -smp 4 and higher the issue produces on both consoles (ttyS1/ttyS0).
My Host cpu i5-6200U has 2 cores and 4 threads.

For reproducing was used this commands (no matter what console we use ttyS1 or ttyS0):
#dmesg > /dev/ttyS*
#dd if=/dev/zero of=/dev/ttyS*

I'm seeing this on AWS EC2 when there's (apparently) high logging volume to the console, very similarly to https://www.reddit.com/r/sysadmin/comments/6zuqad/mongodb_aws_ec2_serial8250_too_much_work_for_irq4/

On further investigation of my instance, there appeared to be no high logging volume to the console, nor anything using the /dev/ttyS0 other than agetty.  Switching from the generic kernel to the AWS kernel seems to have stabilised it. 

Further update: AWS kernel experienced the same error messages after just over 3 hours of runtime.

The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now.
If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience.


[Expired for QEMU because there has been no activity for 60 days.]