1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
|
other: 0.323
graphic: 0.273
KVM: 0.199
permissions: 0.156
semantic: 0.139
device: 0.137
vnc: 0.100
network: 0.092
boot: 0.090
debug: 0.085
performance: 0.084
PID: 0.077
files: 0.073
socket: 0.048
Live migration fails with qemu-img >= 2.10: "Failed to get shared "write" lock\nIs another process using the image?"
Looks like this is pretty new:
http://logs.openstack.org/01/503601/7/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/b19b77c/logs/screen-n-api.txt.gz?level=TRACE#_Sep_19_17_47_11_508623
Sep 19 17:47:11.508623 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: ERROR nova.api.openstack.extensions [None req-e31fde7b-317f-4db9-b225-10b6e11b2dff tempest-LiveMigrationTest-1678596498 tempest-LiveMigrationTest-1678596498] Unexpected exception in API method: MigrationError_Remote: Migration error: Disk info file is invalid: qemu-img failed to execute on /opt/stack/data/nova/instances/806812af-bf9e-4dc1-8e0c-11603ccd9f62/disk : Unexpected error while running command.
Sep 19 17:47:11.508805 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/806812af-bf9e-4dc1-8e0c-11603ccd9f62/disk
Sep 19 17:47:11.508946 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Exit code: 1
Sep 19 17:47:11.509079 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Stdout: u''
Sep 19 17:47:11.509233 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Stderr: u'qemu-img: Could not open \'/opt/stack/data/nova/instances/806812af-bf9e-4dc1-8e0c-11603ccd9f62/disk\': Failed to get shared "write" lock\nIs another process using the image?\n'
Sep 19 17:47:11.509487 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Traceback (most recent call last):
Sep 19 17:47:11.509649 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming
Sep 19 17:47:11.509789 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: res = self.dispatcher.dispatch(message)
Sep 19 17:47:11.510231 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
Sep 19 17:47:11.510418 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: return self._do_dispatch(endpoint, method, ctxt, args)
Sep 19 17:47:11.510555 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
Sep 19 17:47:11.510687 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: result = func(ctxt, **new_args)
Sep 19 17:47:11.510829 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/exception_wrapper.py", line 76, in wrapped
Sep 19 17:47:11.510959 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: function_name, call_dict, binary)
Sep 19 17:47:11.511194 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Sep 19 17:47:11.511713 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: self.force_reraise()
Sep 19 17:47:11.511852 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Sep 19 17:47:11.512037 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: six.reraise(self.type_, self.value, self.tb)
Sep 19 17:47:11.512687 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/exception_wrapper.py", line 67, in wrapped
Sep 19 17:47:11.516811 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: return f(self, context, *args, **kw)
Sep 19 17:47:11.516966 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/compute/utils.py", line 876, in decorated_function
Sep 19 17:47:11.517110 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: return function(self, context, *args, **kwargs)
Sep 19 17:47:11.525392 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/compute/manager.py", line 217, in decorated_function
Sep 19 17:47:11.525526 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: kwargs['instance'], e, sys.exc_info())
Sep 19 17:47:11.525654 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Sep 19 17:47:11.525783 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: self.force_reraise()
Sep 19 17:47:11.525909 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Sep 19 17:47:11.526057 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: six.reraise(self.type_, self.value, self.tb)
Sep 19 17:47:11.526186 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/compute/manager.py", line 205, in decorated_function
Sep 19 17:47:11.526314 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: return function(self, context, *args, **kwargs)
Sep 19 17:47:11.529795 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/compute/manager.py", line 5378, in check_can_live_migrate_source
Sep 19 17:47:11.529952 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: block_device_info)
Sep 19 17:47:11.530083 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5960, in check_can_live_migrate_source
Sep 19 17:47:11.530446 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: block_device_info)
Sep 19 17:47:11.530587 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 6060, in _assert_dest_node_has_enough_disk
Sep 19 17:47:11.530720 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: disk_infos = self._get_instance_disk_info(instance, block_device_info)
Sep 19 17:47:11.531945 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7254, in _get_instance_disk_info
Sep 19 17:47:11.532099 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: block_device_info)
Sep 19 17:47:11.532234 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7222, in _get_instance_disk_info_from_config
Sep 19 17:47:11.532366 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: backing_file = libvirt_utils.get_disk_backing_file(path)
Sep 19 17:47:11.532496 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/virt/libvirt/utils.py", line 197, in get_disk_backing_file
Sep 19 17:47:11.532627 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: backing_file = images.qemu_img_info(path, format).backing_file
Sep 19 17:47:11.532755 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: File "/opt/stack/new/nova/nova/virt/images.py", line 72, in qemu_img_info
Sep 19 17:47:11.532896 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: raise exception.InvalidDiskInfo(reason=msg)
Sep 19 17:47:11.533373 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /opt/stack/data/nova/instances/806812af-bf9e-4dc1-8e0c-11603ccd9f62/disk : Unexpected error while running command.
Sep 19 17:47:11.534451 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=30 -- env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/806812af-bf9e-4dc1-8e0c-11603ccd9f62/disk
Sep 19 17:47:11.534670 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Exit code: 1
Sep 19 17:47:11.534902 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Stdout: u''
Sep 19 17:47:11.541303 ubuntu-xenial-2-node-rax-ord-10997038 <email address hidden>[28339]: Stderr: u'qemu-img: Could not open \'/opt/stack/data/nova/instances/806812af-bf9e-4dc1-8e0c-11603ccd9f62/disk\': Failed to get shared "write" lock\nIs another process using the image?\n'
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Unexpected%20exception%20in%20API%20method%3A%20MigrationError_Remote%3A%20Migration%20error%3A%20Disk%20info%20file%20is%20invalid%3A%20qemu-img%20failed%20to%20execute%20on%5C%22%20AND%20tags%3A%5C%22screen-n-api.txt%5C%22&from=7d
252 hits starting on 9/19, check and gate, all failures.
http://tinyurl.com/lgnq48d
I don't see any related nova changes in the last 48 hours that look like they could cause this. I wonder if there is a new qemu package or something.
qemu-system 1:2.10+dfsg-0ubuntu1~cloud0
qemu-system 1:2.10+dfsg-0ubuntu1~cloud0 was released on 9/5 so we've already been using it for a couple of weeks.
It's probably this change to devstack to use the pike cloud archive:
https://review.openstack.org/#/c/501224/
Fix proposed to branch: master
Review: https://review.openstack.org/505446
See bug 1718133
Reviewed: https://review.openstack.org/505446
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=ee22ca8373abd3b5a4c44a9c5c4da39c511195c8
Submitter: Jenkins
Branch: master
commit ee22ca8373abd3b5a4c44a9c5c4da39c511195c8
Author: Matt Riedemann <email address hidden>
Date: Wed Sep 20 00:29:36 2017 +0000
Revert "Update to using pike cloud-archive"
This reverts commit a7e9a5d447b3eeacfb52d7ddc94445058a8d6fd1.
The jobs that run live migration tests are failing at about
a rate of 50% since this merged. There are no recent changes
to nova in the last 24 hours that are related to live
migration, and this is failing on the master branch only,
so I suspect the failures are due to new qemu packages
getting pulled in from this change.
Change-Id: Ic8481539c6a0cc7af08a736a625b672979435908
Closes-Bug: #1718295
This is a behavioural change in qemu for 2.10 - looks like 'qemu-img info' needs to be passed a '--force-share' option to allow it to read a disk file for a running instance.
This appears like it might be an issue with the ppa in pike
Since it is a wanted behavioral change in upstream qemu setting that task to "Won't Fix" unless we come up with a reason to convince qemu to do so.
Once might argue that info should imply force-share or such, but unless we do so make clear that no qemu change is expected.
James already mentioned bug 1718133.
@James and Matt - should we dup one of those onto the other?
Distro patch to unblock Ubuntu Pike UCA and Artful (needs conditional check for general consumption).
Setting Nova bug task back to 'New' as I think we do need to accommodate this behavioural change in qemu.
Fix proposed to branch: master
Review: https://review.openstack.org/505673
Related fix proposed to branch: master
Review: https://review.openstack.org/505674
Fix proposed to branch: master
Review: https://review.openstack.org/505748
Change abandoned by James Page (<email address hidden>) on branch: master
Review: https://review.openstack.org/505748
Reason: Abandoning in preference of alternative fix.
Reviewed: https://review.openstack.org/505673
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=807579755c4a116309eca5b2bcdbab9d1f393bab
Submitter: Zuul
Branch: master
commit 807579755c4a116309eca5b2bcdbab9d1f393bab
Author: Matt Riedemann <email address hidden>
Date: Wed Sep 20 10:44:11 2017 -0400
Support qemu >= 2.10
Qemu 2.10 added the requirement of a --force-share flag to qemu-img
info when reading information about a disk that is in use by a
guest. We do this a lot in Nova for operations like gathering
information before live migration.
Up until this point all qemu/libvirt version matching has been solely
inside the libvirt driver, however all the image manip code was moved
out to nova.virt.images. We need the version of QEMU available there.
This does it by initializing that version on driver init host. The net
effect is also that broken libvirt connections are figured out
earlier, as there is an active probe for this value.
Co-Authored-By: Sean Dague <email address hidden>
Change-Id: Iae2962bb86100f03fd3ad9aac3767da876291e74
Closes-Bug: #1718295
Reviewed: https://review.openstack.org/505674
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=917ad0998be8c48bfcc0e3031bc1b75cd9ed1927
Submitter: Zuul
Branch: master
commit 917ad0998be8c48bfcc0e3031bc1b75cd9ed1927
Author: Matt Riedemann <email address hidden>
Date: Wed Sep 20 14:46:48 2017 +0000
Update to using pike cloud-archive
This reverts commit ee22ca8373abd3b5a4c44a9c5c4da39c511195c8
Depends-On: Iae2962bb86100f03fd3ad9aac3767da876291e74
Change-Id: I4d5fa052bdc5eef1795f6507589e2eaf4e093e23
Related-Bug: #1718295
Is this going to be backported to Pike, I ran into the same issue with openstack-ansible in Pike
Fix proposed to branch: stable/pike
Review: https://review.openstack.org/509774
Change abandoned by Kevin Lefevre (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/509774
Reason: sorry
Reviewed: https://review.openstack.org/509774
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5e9508b77f86f1fadb173a071e2519aad24534f9
Submitter: Jenkins
Branch: stable/pike
commit 5e9508b77f86f1fadb173a071e2519aad24534f9
Author: Matt Riedemann <email address hidden>
Date: Wed Sep 20 10:44:11 2017 -0400
Support qemu >= 2.10
Qemu 2.10 added the requirement of a --force-share flag to qemu-img
info when reading information about a disk that is in use by a
guest. We do this a lot in Nova for operations like gathering
information before live migration.
Up until this point all qemu/libvirt version matching has been solely
inside the libvirt driver, however all the image manip code was moved
out to nova.virt.images. We need the version of QEMU available there.
This does it by initializing that version on driver init host. The net
effect is also that broken libvirt connections are figured out
earlier, as there is an active probe for this value.
Co-Authored-By: Sean Dague <email address hidden>
Change-Id: Iae2962bb86100f03fd3ad9aac3767da876291e74
Closes-Bug: #1718295
(cherry picked from commit 807579755c4a116309eca5b2bcdbab9d1f393bab)
This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.
This issue was fixed in the openstack/nova 16.0.2 release.
I think I am experiencing this issue on 15.1.13
If it's not going to be backported to O can anyone suggest a workaround?
@Filippo: your options are (1) use an older version of QEMU or (2) backport the change to your Ocata-based deployment and run with it out of tree.
Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/536798
Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/536798
hi , because of power outage, Most of our compute nodes unexpectedly shut down and now I can't start our instances. I am getting "Failed to get "write" lock Is another process using the image?" error. Full error log is http://paste.openstack.org/show/754107/. My environment is OpenStack Pike and Instances are on a nfs shared storage. Nova version is 16.1.6.dev2. This fix can not solve this problem completely. What would you suggest for solve this problem ?
Maybe your files are still in locked state on the NFS system.
That you chould check with qemu-img operations, no need to start the guest.
If they are you might need to remove that lock bit (not sure how atm).
Fix proposed to branch: stable/ocata
Review: https://review.opendev.org/693851
Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ocata
Review: https://review.opendev.org/c/openstack/nova/+/693851
Reason: stable/ocata of openstack/nova transitioned to End of Life ( https://review.opendev.org/c/openstack/releases/+/795664 ), to be able to delete the branch we need to abandon all open patches on the branch.
|