1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
|
virDomainGetBlockJobInfo fails during swap_volume as disk '$disk' not found in domain
Description
===========
The error handling around swap_volume is missing the following failure when calling virDomainGetBlockJobInfo() after the entire device is detached by QEMU (?) after it encounters a job during the block copy job that at first pauses and then somehow resumes the job:
https://8a5fc27780098c5ee1bc-3ac81d180a9c011938b2cbb0293272f3.ssl.cf5.rackcdn.com/790660/5/gate/nova-next/e915ed4/controller/logs/screen-n-cpu.txt
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver [None req-7cfcd661-29d4-4cc3-bc54-db0e7fed1a6e tempest-TestVolumeSwap-1841575704 tempest-TestVolumeSwap-1841575704-project-admin] Failure rebasing volume /dev/sdb on vdb.: libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver Traceback (most recent call last):
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2107, in _swap_volume
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver while not dev.is_job_complete():
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 800, in is_job_complete
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self.get_job_info()
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 707, in get_job_info
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver status = self._guest._domain.blockJobInfo(self._disk, flags=0)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = execute(f, *args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver six.reraise(c, e, tb)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise value
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver rv = meth(*args, **kwargs)
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 985, in blockJobInfo
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver raise libvirtError('virDomainGetBlockJobInfo() failed')
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
May 26 09:49:47.314813 ubuntu-focal-vexxhost-ca-ymq-1-0024823853 nova-compute[114649]: ERROR nova.virt.libvirt.driver
https://zuul.opendev.org/t/openstack/build/e915ed4aeb9346bba83910bd79e9502b/log/controller/logs/libvirt/libvirtd_log.txt
2021-05-26 09:49:40.189+0000: 79419: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"},"id":"libvirt-375"}^M
2021-05-26 09:49:46.154+0000: 79422: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7fc4bc07e7d0 msg={"execute":"blockdev-add","arguments":{"node-name":"libvirt-5-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-5-storage"},"id":"libvirt-379"}^M
2021-05-26 09:49:46.165+0000: 79422: debug : qemuMonitorBlockdevMirror:3112 : jobname=copy-vdb-libvirt-4-format, persistjob=1, device=libvirt-4-format, target=libvirt-5-format, bandwidth=0, granularity=0, buf_size=0, shallow=0
2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'created'(1)
2021-05-26 09:49:46.167+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2)
2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'paused'(3)
2021-05-26 09:49:46.763+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'running'(2)
2021-05-26 09:49:46.841+0000: 79417: debug : qemuProcessHandleDeviceDeleted:1362 : Device virtio-disk1 removed from domain 0x7fc4b416b0e0 instance-0000000b
2021-05-26 09:49:47.457+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'aborting'(8)
2021-05-26 09:49:47.458+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'concluded'(9)
2021-05-26 09:49:47.459+0000: 79417: debug : qemuProcessHandleJobStatusChange:1002 : job 'copy-vdb-libvirt-4-format'(domain: 0x7fc4b416b0e0,instance-0000000b) state changed to 'null'(11)
Steps to reproduce
==================
$ cat queries/virDomainGetBlockJobInfo.yaml
query: >
message:"virDomainGetBlockJobInfo() failed" AND
tags:"screen-n-cpu.txt"
$ elastic-recheck-query queries/virDomainGetBlockJobInfo.yaml
total hits: 6
build_branch
100% master
build_change
50% 786588
50% 792322
build_hostids
50% 1b47a855be51bba01ac6d5e6fdc4859bc17ebe2c8faaeb83392f8ff3 79fb0487675c0137b7ac30f24b5de71c70afb836e46746de770fa0c0
50% 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54 33381c047c348ffefebf6b10cb7f0473c2359757d0bf11cc101eec54
build_name
100% nova-next
build_node
100% ubuntu-focal
build_queue
100% check
build_status
100% FAILURE
build_zuul_url
100% N/A
filename
100% controller/logs/screen-n-cpu.txt
log_url
50% https://89bc735e8a094e3d60b7-4f6db7cd5400cfa66e1c80fde6bd4076.ssl.cf1.rackcdn.com/792322/1/check/nova-next/de697b4/controller/logs/screen-n-cpu.txt
50% https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_035/786588/6/check/nova-next/0357703/controller/logs/screen-n-cpu.txt
loglevel
100% ERROR
module
33% nova.compute.manager
33% nova.virt.libvirt.driver
33% oslo_messaging.rpc.server
node_provider
50% ovh-bhs1
50% rax-iad
port
50% 48014
50% 58238
project
100% openstack/nova
syslog_pid
50% 107528
50% 108261
syslog_program
50% ubuntu-focal-ovh-bhs1-0024748800 nova-compute
50% ubuntu-focal-rax-iad-0024745546 nova-compute
tags
100% screen-n-cpu.txt screen oslofmt
voting
100% 1
zuul_attempts
100% 1
zuul_executor
50% ze01.opendev.org
50% ze07.opendev.org
Expected result
===============
swap_volume at least fails correctly leaving the original device attached.
Actual result
=============
swap_volume fails and the original device appears detached from the device.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
master
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
libvirt + QEMU (no KVM in the gate)
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
images_type=default=qcow2
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/793219
I've added the QEMU project directly to this bug to see if anyone can help us understand what the underlying block job failure is within QEMU and why it then appears to remove the entire device from the instance causing libvirt and Nova to fallover.
Just a note that the QEMU project is moving their bug tracking to gitlab [1] and will automatically migrate launchpad bugs, but it might be more expedient to open an issue on their gitlab tracker.
[1] https://bugs.launchpad.net/qemu/+bug/1914282/comments/3
Reviewed: https://review.opendev.org/c/openstack/nova/+/793219
Committed: https://opendev.org/openstack/nova/commit/d5ed968826895d362f4f2aa21decfdebb9b1fd84
Submitter: "Zuul (22348)"
Branch: master
commit d5ed968826895d362f4f2aa21decfdebb9b1fd84
Author: Lee Yarwood <email address hidden>
Date: Wed May 26 19:27:45 2021 +0100
zuul: Skip swap_volume tests as part of nova-next
The volume update or swap_volume API has long been a source of gate
failures within Nova. Most recently we've seen increased instability
when running the temptest.api.compute.admin.test_volume_swap tests as
part of the nova-next job as documented in bug #1929710.
This change temporarily removes the failing test from the nova-next job
while the underlying issue is identified lower in the virt stack.
Change-Id: Ib56a034fb08e309981d0b4553b8cee8d16b10152
Related-Bug: #1929710
QEMU issue opened for this bug is at https://gitlab.com/qemu-project/qemu/-/issues/287
This problem was reproduced in CI for one of my stable/wallaby patches:
https://zuul.opendev.org/t/openstack/build/3ae58b4f95d7442fb9110c4044acee2b/logs
https://1f24fdc3bf9479288b3a-225501c9cce0e8ba4bd0f4849071211d.ssl.cf1.rackcdn.com/888333/1/check/nova-next/3ae58b4/compute1/logs/libvirt/qemu/instance-00000011_log.txt
https://1f24fdc3bf9479288b3a-225501c9cce0e8ba4bd0f4849071211d.ssl.cf1.rackcdn.com/888333/1/check/nova-next/3ae58b4/compute1/logs/libvirt/libvirtd_log.txt
https://1f24fdc3bf9479288b3a-225501c9cce0e8ba4bd0f4849071211d.ssl.cf1.rackcdn.com/888333/1/check/nova-next/3ae58b4/compute1/logs/screen-n-cpu.txt
https://1f24fdc3bf9479288b3a-225501c9cce0e8ba4bd0f4849071211d.ssl.cf1.rackcdn.com/888333/1/check/nova-next/3ae58b4/job-output.txt
Extracts:
Jul 13 08:35:22.230036 np0034667036 nova-compute[54149]: WARNING nova.virt.libvirt.driver [None req-7222880d-af15-4a90-b16d-24bdf5763e09 None None] Received event <DeviceRemovedEvent: 1689237322.2269797, dd46f9ba-2352-4da6-be3a-ecf68757e3c0 => virtio-disk1> from libvirt but the driver is not waiting for it; ignored.
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver [None req-473da26b-0641-4cda-84ef-9e451e5eddb9 tempest-TestVolumeSwap-869256485 tempest-TestVolumeSwap-869256485-project-admin] Failure rebasing volume /dev/sda on vdb.: libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver Traceback (most recent call last):
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2128, in _swap_volume
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver while not dev.is_job_complete():
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 810, in is_job_complete
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver status = self.get_job_info()
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 717, in get_job_info
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver status = self._guest._domain.blockJobInfo(self._disk, flags=0)
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs)
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver rv = execute(f, *args, **kwargs)
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver six.reraise(c, e, tb)
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver raise value
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver rv = meth(*args, **kwargs)
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver File "/usr/lib/python3/dist-packages/libvirt.py", line 898, in blockJobInfo
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver if ret is None: raise libvirtError ('virDomainGetBlockJobInfo() failed', dom=self)
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver libvirt.libvirtError: invalid argument: disk 'vdb' not found in domain
Jul 13 08:35:22.242368 np0034667036 nova-compute[54149]: ERROR nova.virt.libvirt.driver
2023-07-13 08:35:22.226+0000: 44724: debug : qemuMonitorEmitEvent:1198 : mon=0x7f6bcc07b080 event=DEVICE_DELETED
2023-07-13 08:35:22.226+0000: 44724: debug : qemuProcessHandleEvent:551 : vm=0x7f6bbc0b6db0
2023-07-13 08:35:22.226+0000: 44724: debug : virObjectEventNew:631 : obj=0x55ea304db760
2023-07-13 08:35:22.226+0000: 44724: debug : qemuMonitorJSONIOProcessEvent:205 : handle DEVICE_DELETED handler=0x7f6bc1e3db80 data=0x55ea304ad810
2023-07-13 08:35:22.226+0000: 44724: debug : qemuMonitorEmitDeviceDeleted:1432 : mon=0x7f6bcc07b080
2023-07-13 08:35:22.226+0000: 44724: debug : qemuProcessHandleDeviceDeleted:1364 : Device virtio-disk1 removed from domain 0x7f6bbc0b6db0 instance-00000011
2023-07-13 08:35:22.226+0000: 44724: debug : virObjectEventDispose:124 : obj=0x55ea304db760
2023-07-13 08:35:22.226+0000: 57688: debug : qemuProcessEventHandler:4866 : vm=0x7f6bbc0b6db0, event=2
2023-07-13 08:35:22.226+0000: 57688: debug : processDeviceDeletedEvent:4282 : Removing device virtio-disk1 from domain 0x7f6bbc0b6db0 instance-00000011
2023-07-13 08:35:22.226+0000: 57688: debug : qemuDomainObjBeginJobInternal:9416 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7f6bbc0b6db0 name=instance-00000011, current job=none agentJob=none async=none)
2023-07-13 08:35:22.226+0000: 57688: debug : qemuDomainObjBeginJobInternal:9470 : Started job: modify (async=none vm=0x7f6bbc0b6db0 name=instance-00000011)
2023-07-13 08:35:22.226+0000: 57688: debug : qemuDomainRemoveDiskDevice:4218 : Removing disk virtio-disk1 from domain 0x7f6bbc0b6db0 instance-00000011
2023-07-13 08:35:22.226+0000: 57688: debug : qemuDomainObjEnterMonitorInternal:9869 : Entering monitor (mon=0x7f6bcc07b080 vm=0x7f6bbc0b6db0 name=instance-00000011)
2023-07-13 08:35:22.226+0000: 57688: debug : qemuDomainObjExitMonitorInternal:9892 : Exited monitor (mon=0x7f6bcc07b080 vm=0x7f6bbc0b6db0 name=instance-00000011)
2023-07-13 08:35:22.226+0000: 57688: debug : virObjectEventNew:631 : obj=0x7f6b80029ba0
2023-07-13 08:35:22.226+0000: 44724: debug : virObjectEventDispose:124 : obj=0x7f6b80029ba0
2023-07-13 08:35:22.227+0000: 57688: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7f6bbc0b6db0 name=instance-00000011)
2023-07-13 08:35:22.233+0000: 44727: debug : virThreadJobSet:93 : Thread 44727 (virNetServerHandleJob) is now running job remoteDispatchDomainGetBlockJobInfo
2023-07-13 08:35:22.233+0000: 44727: debug : qemuDomainObjBeginJobInternal:9416 : Starting job: job=query agentJob=none asyncJob=none (vm=0x7f6bbc0b6db0 name=instance-00000011, current job=none agentJob=none async=none)
2023-07-13 08:35:22.233+0000: 44727: debug : qemuDomainObjBeginJobInternal:9470 : Started job: query (async=none vm=0x7f6bbc0b6db0 name=instance-00000011)
2023-07-13 08:35:22.233+0000: 44727: error : qemuDomainDiskByName:13629 : invalid argument: disk 'vdb' not found in domain
2023-07-13 08:35:22.233+0000: 44727: debug : qemuDomainObjEndJob:9746 : Stopping job: query (async=none vm=0x7f6bbc0b6db0 name=instance-00000011)
|