summary refs log tree commit diff stats
path: root/gitlab/issues/target_missing/host_missing/accel_missing/1601.toml
blob: 888c4123b4779745047fd33a4bf053103dc5c95d (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
id = 1601
title = "QEMU Guest Agent (qga) high CPU usage (1 core at 100%). May happen with guest-network-get-interfaces. Strace says: EAGAIN (Resource temporarily unavailable)"
state = "opened"
created_at = "2023-04-13T16:11:33.310Z"
closed_at = "n/a"
labels = ["Guest Agent"]
url = "https://gitlab.com/qemu-project/qemu/-/issues/1601"
host-os = "Fedora 37"
host-arch = "x86_64"
qemu-version = "QEMU emulator version 7.0.0 (qemu-7.0.0-15.fc37)"
guest-os = "Fedora 37"
guest-arch = "x86_64"
description = """I have a VM that has the QEMU guest agent installed. I use the QGA to get information periodically about the network interfaces. Meaning, I execute the `guest-network-get-interfaces` in a period around 1-2 seconds each.

After a while (maybe a day or so) the QGA seems to lock up with the CPU at 100% in 1 core. It does not reply to more commands, and restarting the service sometimes doesn't work, so a hard reboot it is.

`dmesg` doesn't show anything useful/relevant. When attempting to edit the `qemu-guest-agent.service` and append `/usr/bin/strace` to it, I can get this in a loop:

```
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
strace[114154]: write(4, "{\\"return\\": [{\\"name\\": \\"lo\\", \\"ip-a"..., 2047) = -1 EAGAIN (Resource temporarily unavailable)
```

I don't have more knowledge to debug this further. I can help to provide more info if some guidance is provided.

**Don't know if it helps/affects**, but the guest VM is running Docker with around 10 containers or so, so when QGA works, I get around 18 network interfaces, counting loopback, docker `veth`s and `br` interfaces."""
reproduce = """1. Create a VM with Fedora 37
2. Install the QEMU Guest Agent
3. Call `guest-network-get-interfaces` in a loop every 1-2 seconds (after it finishes) through QGA using the unix socket using the provided python script, called as: `python qga.py --socket /run/test-vm-108.qga '{ "execute": "guest-network-get-interfaces" }'`
4. Eventually, the guest agent will lock up at 100% CPU usage on 1 core"""
additional = """Python script used to call QGA:
```
import argparse
import socket
import sys

def main():
    buf_size = 1024
    timeout_secs = .5

    parser = argparse.ArgumentParser()
    parser.add_argument('--socket', required=True, help='Path to Unix socket')
    parser.add_argument('request', help='Request to send')
    args = parser.parse_args()

    unix_socket_path = args.socket
    request = args.request

    try:
        with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as sock:
            sock.settimeout(timeout_secs)
            sock.connect(unix_socket_path)

            request_bytes = request.encode('utf-8')
            sock.sendall(request_bytes)

            response_bytes = b''
            received_bytes = sock.recv(buf_size)
            response_bytes += received_bytes

            sock.setblocking(False)
            while True:
                try:
                    received_bytes = sock.recv(buf_size)
                    if not received_bytes:
                        break
                    response_bytes += received_bytes
                except (BlockingIOError, TimeoutError):
                    break
                except (FileNotFoundError, ConnectionRefusedError):
                    sock.close()
                    sys.exit()

            response = response_bytes.decode('utf-8').strip()
            print(response)

    except (TimeoutError, FileNotFoundError, BlockingIOError, ConnectionRefusedError):
        sys.exit()

if __name__ == "__main__":
    main()
```"""