1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
|
id = 1601
title = "QEMU Guest Agent (qga) high CPU usage (1 core at 100%). May happen with guest-network-get-interfaces. Strace says: EAGAIN (Resource temporarily unavailable)"
state = "opened"
created_at = "2023-04-13T16:11:33.310Z"
closed_at = "n/a"
labels = ["Guest Agent"]
url = "https://gitlab.com/qemu-project/qemu/-/issues/1601"
host-os = "Fedora 37"
host-arch = "x86_64"
qemu-version = "QEMU emulator version 7.0.0 (qemu-7.0.0-15.fc37)"
guest-os = "Fedora 37"
guest-arch = "x86_64"
description = """I have a VM that has the QEMU guest agent installed. I use the QGA to get information periodically about the network interfaces. Meaning, I execute the `guest-network-get-interfaces` in a period around 1-2 seconds each.
After a while (maybe a day or so) the QGA seems to lock up with the CPU at 100% in 1 core. It does not reply to more commands, and restarting the service sometimes doesn't work, so a hard reboot it is.
`dmesg` doesn't show anything useful/relevant. When attempting to edit the `qemu-guest-agent.service` and append `/usr/bin/strace` to it, I can get this in a loop:
```
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
```
I don't have more knowledge to debug this further. I can help to provide more info if some guidance is provided.
**Don't know if it helps/affects**, but the guest VM is running Docker with around 10 containers or so, so when QGA works, I get around 18 network interfaces, counting loopback, docker `veth`s and `br` interfaces."""
reproduce = """1. Create a VM with Fedora 37
2. Install the QEMU Guest Agent
3. Call `guest-network-get-interfaces` in a loop every 1-2 seconds (after it finishes) through QGA using the unix socket using the provided python script, called as: `python qga.py --socket /run/test-vm-108.qga '{ "execute": "guest-network-get-interfaces" }'`
4. Eventually, the guest agent will lock up at 100% CPU usage on 1 core"""
additional = """Python script used to call QGA:
```
import argparse
import socket
import sys
def main():
buf_size = 1024
timeout_secs = .5
parser = argparse.ArgumentParser()
parser.add_argument('--socket', required=True, help='Path to Unix socket')
parser.add_argument('request', help='Request to send')
args = parser.parse_args()
unix_socket_path = args.socket
request = args.request
try:
with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as sock:
sock.settimeout(timeout_secs)
sock.connect(unix_socket_path)
request_bytes = request.encode('utf-8')
sock.sendall(request_bytes)
response_bytes = b''
received_bytes = sock.recv(buf_size)
response_bytes += received_bytes
sock.setblocking(False)
while True:
try:
received_bytes = sock.recv(buf_size)
if not received_bytes:
break
response_bytes += received_bytes
except (BlockingIOError, TimeoutError):
break
except (FileNotFoundError, ConnectionRefusedError):
sock.close()
sys.exit()
response = response_bytes.decode('utf-8').strip()
print(response)
except (TimeoutError, FileNotFoundError, BlockingIOError, ConnectionRefusedError):
sys.exit()
if __name__ == "__main__":
main()
```"""
|