summary refs log tree commit diff stats
path: root/results/classifier/zero-shot/105/graphic/1719196
blob: e9f4980ac222701dbca51a2d82f042674734e8a7 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
graphic: 0.958
other: 0.953
semantic: 0.943
device: 0.931
assembly: 0.924
network: 0.907
mistranslation: 0.899
instruction: 0.888
boot: 0.842
vnc: 0.754
KVM: 0.750
socket: 0.736

[arm64 ocata] newly created instances are unable to raise network interfaces

arm64 Ocata ,  

I'm testing to see I can get Ocata running on arm64 and using the openstack-base bundle to deploy it.  I have added the bundle to the log file attached to this bug. 

When I create a new instance via nova, the VM comes up and runs, however fails to raise its eth0 interface. This occurs on both internal and external networks. 

ubuntu@openstackaw:~$ nova list
+--------------------------------------+---------+--------+------------+-------------+--------------------+
| ID                                   | Name    | Status | Task State | Power State | Networks           |
+--------------------------------------+---------+--------+------------+-------------+--------------------+
| dcaf6d51-f81e-4cbd-ac77-0c5d21bde57c | sfeole1 | ACTIVE | -          | Running     | internal=10.5.5.3  |
| aa0b8aee-5650-41f4-8fa0-aeccdc763425 | sfeole2 | ACTIVE | -          | Running     | internal=10.5.5.13 |
+--------------------------------------+---------+--------+------------+-------------+--------------------+
ubuntu@openstackaw:~$ nova show aa0b8aee-5650-41f4-8fa0-aeccdc763425
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | awrep3                                                   |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | awrep3.maas                                              |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000003                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2017-09-24T14:23:08.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2017-09-24T14:22:41Z                                     |
| flavor                               | m1.small (717660ae-0440-4b19-a762-ffeb32a0575c)          |
| hostId                               | 5612a00671c47255d2ebd6737a64ec9bd3a5866d1233ecf3e988b025 |
| id                                   | aa0b8aee-5650-41f4-8fa0-aeccdc763425                     |
| image                                | zestynosplash (e88fd1bd-f040-44d8-9e7c-c462ccf4b945)     |
| internal network                     | 10.5.5.13                                                |
| key_name                             | mykey                                                    |
| metadata                             | {}                                                       |
| name                                 | sfeole2                                                  |
| os-extended-volumes:volumes_attached | []                                                       |
| progress                             | 0                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 9f7a21c1ad264fec81abc09f3960ad1d                         |
| updated                              | 2017-09-24T14:23:09Z                                     |
| user_id                              | e6bb6f5178a248c1b5ae66ed388f9040                         |
+--------------------------------------+----------------------------------------------------------+



As seen above the instances boot an run.  Full Console output is attached to this bug. 


[  OK  ] Started Initial cloud-init job (pre-networking).
[  OK  ] Reached target Network (Pre).
[  OK  ] Started AppArmor initialization.
         Starting Raise network interfaces...
[FAILED] Failed to start Raise network interfaces.
See 'systemctl status networking.service' for details.
         Starting Initial cloud-init job (metadata service crawler)...
[  OK  ] Reached target Network.
[  315.051902] cloud-init[881]: Cloud-init v. 0.7.9 running 'init' at Fri, 22 Sep 2017 18:29:15 +0000. Up 314.70 seconds.
[  315.057291] cloud-init[881]: ci-info: +++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++
[  315.060338] cloud-init[881]: ci-info: +--------+------+-----------+-----------+-------+-------------------+
[  315.063308] cloud-init[881]: ci-info: | Device |  Up  |  Address  |    Mask   | Scope |     Hw-Address    |
[  315.066304] cloud-init[881]: ci-info: +--------+------+-----------+-----------+-------+-------------------+
[  315.069303] cloud-init[881]: ci-info: | eth0:  | True |     .     |     .     |   .   | fa:16:3e:39:4c:48 |
[  315.072308] cloud-init[881]: ci-info: | eth0:  | True |     .     |     .     |   d   | fa:16:3e:39:4c:48 |
[  315.075260] cloud-init[881]: ci-info: |  lo:   | True | 127.0.0.1 | 255.0.0.0 |   .   |         .         |
[  315.078258] cloud-init[881]: ci-info: |  lo:   | True |     .     |     .     |   d   |         .         |
[  315.081249] cloud-init[881]: ci-info: +--------+------+-----------+-----------+-------+-------------------+
[  315.084240] cloud-init[881]: 2017-09-22 18:29:15,393 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0xffffb10794e0>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]




----------------



I have checked all services in neutron and made sure that they are running and restarted in neutron-gateway 


 [ + ]  neutron-dhcp-agent
 [ + ]  neutron-l3-agent
 [ + ]  neutron-lbaasv2-agent
 [ + ]  neutron-metadata-agent
 [ + ]  neutron-metering-agent
 [ + ]  neutron-openvswitch-agent
 [ + ]  neutron-ovs-cleanup

and have also restarted and checked the nova-compute logs for their neutron counterparts.


 [ + ]  neutron-openvswitch-agent
 [ + ]  neutron-ovs-cleanup



 There are some warnings/errors in the neutron-gateway logs which I have attached the full tarball in a separate attachment to this bug:

2017-09-24 14:39:33.152 10322 INFO ryu.base.app_manager [-] instantiating app ryu.controller.ofp_handler of OFPHandler
2017-09-24 14:39:33.153 10322 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
2017-09-24 14:39:39.577 10322 ERROR neutron.agent.ovsdb.impl_vsctl [req-2f084ae8-13dc-47dc-b351-24c8f3c57067 - - - - -] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--id=@manager', 'create', 'Manager', 'target="ptcp:6640:127.0.0.1"', '--', 'add', 'Open_vSwitch', '.', 'manager_options', '@manager']. Exception: Exit code: 1; Stdin: ; Stdout: ; Stderr: ovs-vsctl: transaction error: {"details":"Transaction causes multiple rows in \"Manager\" table to have identical values (\"ptcp:6640:127.0.0.1\") for index on column \"target\".  First row, with UUID e02a5f7f-bfd2-4a1d-ae3c-0321db4bd3fb, existed in the database before this transaction and was not modified by the transaction.  Second row, with UUID 6e9aba3a-471a-4976-bffd-b7131bbe5377, was inserted by this transaction.","error":"constraint violation"}



These warnings/errors also occur on the nova-compute hosts in /var/log/neutron/




2017-09-22 18:54:52.130 387556 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
2017-09-22 18:54:56.124 387556 ERROR neutron.agent.ovsdb.impl_vsctl [req-e291c2f9-a123-422c-be7c-fadaeb5decfa - - - - -] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--id=@manager', 'create', 'Manager', 'target="ptcp:6640:127.0.0.1"', '--', 'add', 'Open_vSwitch', '.', 'manager_options', '@manager']. Exception: Exit code: 1; Stdin: ; Stdout: ; Stderr: ovs-vsctl: transaction error: {"details":"Transaction causes multiple rows in \"Manager\" table to have identical values (\"ptcp:6640:127.0.0.1\") for index on column \"target\".  First row, with UUID 9f27ddee-9881-4cbc-9777-2f42fee735e9, was inserted by this transaction.  Second row, with UUID ccf0e097-09d5-449c-b353-6b69781dc3f7, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}



I'm not sure if the above error could be pertaining to the failure, I have also attached those logs to this bug as well...


.
(neutron) agent-list
+--------------------------------------+----------------------+--------+-------------------+-------+----------------+---------------------------+
| id                                   | agent_type           | host   | availability_zone | alive | admin_state_up | binary                    |
+--------------------------------------+----------------------+--------+-------------------+-------+----------------+---------------------------+
| 0cca03fb-abb2-4704-8b0b-e7d3e117d882 | DHCP agent           | awrep1 | nova              | :-)   | True           | neutron-dhcp-agent        |
| 14a5fd52-fbc3-450c-96d5-4e9a65776dad | L3 agent             | awrep1 | nova              | :-)   | True           | neutron-l3-agent          |
| 2ebc7238-5e61-41f8-bc60-df14ec6b226b | Loadbalancerv2 agent | awrep1 |                   | :-)   | True           | neutron-lbaasv2-agent     |
| 4f6275be-fc8b-4994-bdac-13a4b76f6a83 | Metering agent       | awrep1 |                   | :-)   | True           | neutron-metering-agent    |
| 86ecc6b0-c100-4298-b861-40c17516cc08 | Open vSwitch agent   | awrep1 |                   | :-)   | True           | neutron-openvswitch-agent |
| 947ad3ab-650b-4b96-a520-00441ecb33e7 | Open vSwitch agent   | awrep4 |                   | :-)   | True           | neutron-openvswitch-agent |
| 996b0692-7d19-4641-bec3-e057f4a856f6 | Open vSwitch agent   | awrep3 |                   | :-)   | True           | neutron-openvswitch-agent |
| ab6b1065-0b98-4cf3-9f46-6bddba0c5e75 | Metadata agent       | awrep1 |                   | :-)   | True           | neutron-metadata-agent    |
| fe24f622-b77c-4eed-ae22-18e4195cf763 | Open vSwitch agent   | awrep2 |                   | :-)   | True           | neutron-openvswitch-agent |
+--------------------------------------+----------------------+--------+-------------------+-------+----------------+---------------------------+


Should neutron-openvswitch be assigned to the 'nova' availability zone? 

I have also attached the ovs-vsctl show from a nova-compute host and the neutron-gateway host to ensure that the open v switch routes are correct.


I can supply more logs if required.









juju status
bundle file
vm console output 
and nova service list 

can be found in ocata-arm64-neutronbug.txt

Running tcpdump on the guests tap device shows the dhcp request leaving and the reply coming back. 

This may be qemu related, rather than libvirt - further investigation required - but unlikely to be a problem with neutron-gateway charm

I have confirmed this bug with another arm64 + ocata deployment. 

Further summary:

we can see dhcp requests leaving and returning to the tap interface associated with the guest, but the guest does not register the returned packet (does not acquire an address).

I was also able to reproduce this again on Cavium ThunderX hardware, 


ii  ipxe-qemu                            1.0.0+git-20150424.a25a16d-1ubuntu1        all          PXE boot firmware - ROM images for qemu
ii  qemu-block-extra:arm64               1:2.8+dfsg-3ubuntu2.3~cloud0               arm64        extra block backend modules for qemu-system and qemu-utils
ii  qemu-efi                             0~20160408.ffea0a2c-2                      all          UEFI firmware for virtual machines
ii  qemu-kvm                             1:2.8+dfsg-3ubuntu2.3~cloud0               arm64        QEMU Full virtualization
ii  qemu-system-arm                      1:2.8+dfsg-3ubuntu2.3~cloud0               arm64        QEMU full system emulation binaries (arm)
ii  qemu-system-common                   1:2.8+dfsg-3ubuntu2.3~cloud0               arm64        QEMU full system emulation binaries (common files)
ii  qemu-utils                           1:2.8+dfsg-3ubuntu2.3~cloud0               arm64        QEMU utilities


-------------------

ii  libvirt-bin                          2.5.0-3ubuntu5.4~cloud0                    arm64        programs for the libvirt library
ii  libvirt-clients                      2.5.0-3ubuntu5.4~cloud0                    arm64        Programs for the libvirt library
ii  libvirt-daemon                       2.5.0-3ubuntu5.4~cloud0                    arm64        Virtualization daemon
ii  libvirt-daemon-system                2.5.0-3ubuntu5.4~cloud0                    arm64        Libvirt daemon configuration files
ii  libvirt0:arm64                       2.5.0-3ubuntu5.4~cloud0                    arm64        library for interfacing with different virtualization systems
ii  nova-compute-libvirt                 2:15.0.6-0ubuntu1~cloud0                   all          OpenStack Compute - compute node libvirt support
ii  python-libvirt                       3.0.0-2~cloud0                             arm64        libvirt Python bindings


Just to further add to comment #6, That this is not a neutron issue,  Here is the tcpdump output of the guests tap device shows the dhcp request leaving and the reply coming back. For anyways that may be curious.


Guest MAC:  

    <interface type='bridge'>
      <mac address='fa:16:3e:1d:a8:82'/>
      <source bridge='qbrc8faaf66-7f'/>
      <target dev='tapc8faaf66-7f'/>
      <model type='virtio'/>
      <address type='virtio-mmio'/>
    </interface>



$ sudo ip netns exec qdhcp-a9958ab4-8a7e-4ded-b9a0-860bc42f79d9 tcpdump -A -l -i ns-c751afb3-2b
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ns-c751afb3-2b, link-type EN10MB (Ethernet), capture size 262144 bytes
13:51:11.458941 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
`....$..................................:...................................
13:51:11.462966 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1d:a8:82 (oui Unknown), length 300
E..H......9..........D.C.4q.....5QN{......................>.............................................................................................................................................................................................................c.Sc5....ubuntu7.......w.,/.y*..................................
13:51:11.463331 IP 10.5.5.2.bootps > 10.5.5.9.bootpc: BOOTP/DHCP, Reply, length 328
E..dY'..@...




Today I ran some tests and installed a Newton Deployment on arm64, which we already know works.  I upgraded QEMU and Libvirt on one of the hypervisors from the xenial-updates/ocata cloud-archive. 

See attached notes. 

Libvirt - 1.3.1-1ubuntu10.14 -> 2.5.0-3ubuntu5.5~cloud0
QEMU - 1:2.5+dfsg-5ubuntu10.16 -> 1:2.8+dfsg-3ubuntu2.3~cloud0

I was able to reset the already built instance on the hypervisor and was able to receive a dhcp response from the ovs tap device. Eth0 came up as expected with an internal tenant IP. 


Steps to reproduce. 

1.) Install Newton & start a few VM's 
2.) Choose 1 hypervisor , upgrade qemu & libvirt to versions from the Ocata cloud-archive. 
3.) Reset the running VM so that it now runs with the latest QEMU/Libvirt 
4.) Reset the Instance, see if it boots and network can be reached. 



Taken from the upgraded hypervisor:

ubuntu@awrep3:/var/lib/nova/instances/2cec409e-de92-4d29-ad68-3f1d1f8be7fc$ sudo qemu-system-aarch64 --version
QEMU emulator version 2.8.0(Debian 1:2.8+dfsg-3ubuntu2.3~cloud0)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers


ubuntu@awrep3:/var/lib/nova/instances/2cec409e-de92-4d29-ad68-3f1d1f8be7fc$ sudo libvirtd --version
libvirtd (libvirt) 2.5.0


Taken from the upgraded hypervisor 

ubuntu@aw3:/var/lib/nova/instances/2cec409e-de92-4d29-ad68-3f1d1f8be7fc$ sudo qemu-system-aarch64 --version
QEMU emulator version 2.8.0(Debian 1:2.8+dfsg-3ubuntu2.3~cloud0)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers


ubuntu@aw3:/var/lib/nova/instances/2cec409e-de92-4d29-ad68-3f1d1f8be7fc$ sudo libvirtd --version
libvirtd (libvirt) 2.5.0


I was able to gather libvirt XMLs from both Newton and Ocata Instances, see attached 

sfeole@BSG-75:~$ diff xmlocata xmlnewton 
1,2c1,2
< 
< main type='kvm' id='1'>
---
> ubuntu@lundmark:/var/lib/nova/instances/358596e4-135d-461d-a514-84116440014c$ sudo virsh dumpxml instance-00000001
> <domain type='kvm' id='1'>
4c4
<   <uuid>7c0dcd78-d6b4-4575-a882-ee5d29c64fe0</uuid>
---
>   <uuid>358596e4-135d-461d-a514-84116440014c</uuid>
7,9c7,9
<       <nova:package version="15.0.6"/>
<       <nova:name>sfeole14</nova:name>
<       <nova:creationTime>2017-10-05 00:58:15</nova:creationTime>
---
>       <nova:package version="14.0.8"/>
>       <nova:name>sfeole-newton</nova:name>
>       <nova:creationTime>2017-10-05 01:40:39</nova:creationTime>
18,19c18,19
<         <nova:user uuid="d9f92a61c37948d9a29b8cc37e1bca05">admin</nova:user>
<         <nova:project uuid="701441267bd148d3842f1696530b1c92">admin</nova:project>
---
>         <nova:user uuid="12d13712253141ab845b89406507cd6c">admin</nova:user>
>         <nova:project uuid="8b8fcaf183954f45b3eb15b27c52ec94">admin</nova:project>
21c21
<       <nova:root type="image" uuid="f329117f-5da2-4d61-8341-89a969bf00e7"/>
---
>       <nova:root type="image" uuid="fcdf7c26-4238-4594-b8ee-fd59f601fdcb"/>
34c34
<     <type arch='aarch64' machine='virt-2.8'>hvm</type>
---
>     <type arch='aarch64' machine='virt'>hvm</type>
58c58
<       <source file='/var/lib/nova/instances/7c0dcd78-d6b4-4575-a882-ee5d29c64fe0/disk'/>
---
>       <source file='/var/lib/nova/instances/358596e4-135d-461d-a514-84116440014c/disk'/>
61c61
<         <source file='/var/lib/nova/instances/_base/035458feead7f83be80ed020442ebf815a4067ea'/>
---
>         <source file='/var/lib/nova/instances/_base/ab7429e8558ea27fb54f70eb92fbd0c1c07dd595'/>
70c70
<       <source file='/var/lib/nova/instances/7c0dcd78-d6b4-4575-a882-ee5d29c64fe0/disk.eph0'/>
---
>       <source file='/var/lib/nova/instances/358596e4-135d-461d-a514-84116440014c/disk.eph0'/>
82a83,93
>     <controller type='pci' index='1' model='dmi-to-pci-bridge'>
>       <model name='i82801b11-bridge'/>
>       <alias name='pci.1'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
>     </controller>
>     <controller type='pci' index='2' model='pci-bridge'>
>       <model name='pci-bridge'/>
>       <target chassisNr='2'/>
>       <alias name='pci.2'/>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
>     </controller>
84,86c95,97
<       <mac address='fa:16:3e:a6:4b:d4'/>
<       <source bridge='qbra7012530-32'/>
<       <target dev='tapa7012530-32'/>
---
>       <mac address='fa:16:3e:8e:fc:48'/>
>       <source bridge='qbrb5d335fc-46'/>
>       <target dev='tapb5d335fc-46'/>
92c103
<       <source path='/var/lib/nova/instances/7c0dcd78-d6b4-4575-a882-ee5d29c64fe0/console.log'/>
---
>       <source path='/var/lib/nova/instances/358596e4-135d-461d-a514-84116440014c/console.log'/>
95a107,111
>     <serial type='pty'>
>       <source path='/dev/pts/3'/>
>       <target port='1'/>
>       <alias name='serial1'/>
>     </serial>
97c113
<       <source path='/var/lib/nova/instances/7c0dcd78-d6b4-4575-a882-ee5d29c64fe0/console.log'/>
---
>       <source path='/var/lib/nova/instances/358596e4-135d-461d-a514-84116440014c/console.log'/>
108,113c124,125
<     <label>libvirt-7c0dcd78-d6b4-4575-a882-ee5d29c64fe0</label>
<     <imagelabel>libvirt-7c0dcd78-d6b4-4575-a882-ee5d29c64fe0</imagelabel>
<   </seclabel>
<   <seclabel type='dynamic' model='dac' relabel='yes'>
<     <label>+64055:+117</label>
<     <imagelabel>+64055:+117</imagelabel>
---
>     <label>libvirt-358596e4-135d-461d-a514-84116440014c</label>
>     <imagelabel>libvirt-358596e4-135d-461d-a514-84116440014c</imagelabel>
sfeole@BSG-75:~$ 


Thanks so much for doing that Sean.

Omitting expected changes (uuid, mac address, etc), here's are the significant changes I see:

1) N uses the QEMU 'virt' model, O uses 'virt-2.8'
2) N and O both expose a pci root, but N also exposed 2 PCI bridges that O does not.
3) N exposes an additional serial device.
4) N and O both use an apparmor seclabel. However, O also has a DAC model.

#4 is the most interesting to me. Is there a way to configure ocata nova to not enable DAC?

Hi,
I was reading into this after being back from PTO (actually back next monday).
I was wondering as I tested (without openstack) just that last week with Dannf on the Rally).
And indeed it seems to work for me on Artful (which as we all know is =Ocata in terms of SW stack).

$ cat arm-template.xml 
<domain type='kvm'>
        <os>
                <type arch='aarch64' machine='virt'>hvm</type>
                <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
                <nvram template='/usr/share/AAVMF/AAVMF_CODE.fd'>/tmp/AAVMF_CODE.fd</nvram>
                <boot dev='hd'/>
        </os>
        <features>
                <acpi/>
                <apic/>
                <pae/>
        </features>
        <cpu mode='custom' match='exact'>
                <model fallback='allow'>host</model>
        </cpu>
        <devices>
                <interface type='network'>
                        <source network='default'/>
                        <model type='virtio'/>
                </interface>
                <serial type='pty'>
                        <source path='/dev/pts/3'/>
                        <target port='0'/>
                </serial>
        </devices>
</domain>

$ uvt-kvm create --template arm-template.xml --password=ubuntu artful-test4 release=artful arch=arm64 label=daily

The template is created in a way to let as much as possible for libvirt and qemu to fill in defaults. That way if one of them change we do not have to follow and adapt.
Maybe such a thing is happening to you for the more "defined" xml that openstack is sending.

From the hosts point of view all looks normal in journal, you see start and dhcp discover/offer/ack sequence:
Okt 11 10:45:43 seidel kernel: virbr0: port 2(vnet0) entered learning state
Okt 11 10:45:45 seidel kernel: virbr0: port 2(vnet0) entered forwarding state
Okt 11 10:45:45 seidel kernel: virbr0: topology change detected, propagating
Okt 11 10:46:14 seidel dnsmasq-dhcp[2610]: DHCPDISCOVER(virbr0) 52:54:00:b1:db:89
Okt 11 10:46:14 seidel dnsmasq-dhcp[2610]: DHCPOFFER(virbr0) 192.168.122.13 52:54:00:b1:db:89
Okt 11 10:46:14 seidel dnsmasq-dhcp[2610]: DHCPDISCOVER(virbr0) 52:54:00:b1:db:89
Okt 11 10:46:14 seidel dnsmasq-dhcp[2610]: DHCPOFFER(virbr0) 192.168.122.13 52:54:00:b1:db:89
Okt 11 10:46:14 seidel dnsmasq-dhcp[2610]: DHCPREQUEST(virbr0) 192.168.122.13 52:54:00:b1:db:89
Okt 11 10:46:14 seidel dnsmasq-dhcp[2610]: DHCPACK(virbr0) 192.168.122.13 52:54:00:b1:db:89 ubuntu

The guest also seems "normal" to me:
$ uvt-kvm ssh artful-test4 -- journalctl -u systemd-networkd
-- Logs begin at Wed 2017-10-11 10:46:03 UTC, end at Wed 2017-10-11 11:02:31 UTC. --
Oct 11 10:46:11 ubuntu systemd[1]: Starting Network Service...
Oct 11 10:46:11 ubuntu systemd-networkd[547]: Enumeration completed
Oct 11 10:46:11 ubuntu systemd[1]: Started Network Service.
Oct 11 10:46:11 ubuntu systemd-networkd[547]: enp1s0: IPv6 successfully enabled
Oct 11 10:46:11 ubuntu systemd-networkd[547]: enp1s0: Gained carrier
Oct 11 10:46:12 ubuntu systemd-networkd[547]: enp1s0: Gained IPv6LL
Oct 11 10:46:14 ubuntu systemd-networkd[547]: enp1s0: DHCPv4 address 192.168.122.13/24 via 192.168.122.1
Oct 11 10:46:14 ubuntu systemd-networkd[547]: Not connected to system bus, ignoring transient hostname.
Oct 11 10:46:24 ubuntu systemd-networkd[547]: enp1s0: Configured

The biggest and obviously most important difference is in the networking that is used:
    <interface type='network'>
      <mac address='52:54:00:b1:db:89'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

While openstack generates a type="bridge" network where the bridge is not managed by libvirt (compared to the default net I use).
Never the less both setups create a bridge and tap the guest on it.
So this should functionally be equivalent other than the bridge setup right?


I tried to make a bridge type config matching to what I had before but close to yours:
    <interface type='bridge'>
      <mac address='52:54:00:b1:db:89'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>   
      <model type='virtio'/>
      <alias name='net0'/>                
      <address type='virtio-mmio'/>
    </interface>

But this again works:
ubuntu@artful-test4:~$ journalctl -u systemd-networkd --no-pager
-- Logs begin at Wed 2017-10-11 11:12:10 UTC, end at Wed 2017-10-11 11:13:00 UTC. --
Oct 11 11:12:17 artful-test4 systemd[1]: Starting Network Service...
Oct 11 11:12:17 artful-test4 systemd-networkd[604]: Enumeration completed
Oct 11 11:12:17 artful-test4 systemd[1]: Started Network Service.
Oct 11 11:12:17 artful-test4 systemd-networkd[604]: enp1s0: IPv6 successfully enabled
Oct 11 11:12:17 artful-test4 systemd-networkd[604]: enp1s0: Gained carrier
Oct 11 11:12:18 artful-test4 systemd-networkd[604]: enp1s0: Gained IPv6LL
Oct 11 11:12:20 artful-test4 systemd-networkd[604]: enp1s0: DHCPv4 address 192.168.122.14/24 via 192.168.122.1
Oct 11 11:12:20 artful-test4 systemd-networkd[604]: Not connected to system bus, ignoring transient hostname.
Oct 11 11:12:30 artful-test4 systemd-networkd[604]: enp1s0: Configured

Dumping the XML to check if it might have rewritten a lot shows me that my config is good:
    <interface type='bridge'>
      <mac address='52:54:00:b1:db:89'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='virtio-mmio'/>
    </interface>

So TL;DR a config almost the same as yours works.
Yet the difference is that livbirt has set up 'virbr0' in my case and I'd assume openstack did create qbrb5d335fc-46 on its own.

As next step I'd recommend iterating your config around different bridge scenarios to find what breaks it.
Then if reasonable try to exclude openstack from the equation as I shown above (or not if you think the fix needs to be in openstack).

I hope this helped to get closer to the root casue, but I thought that a working example on the new Ocata stack might be the best start with.

Note this was done on:
libvirt-daemon-system          3.6.0-1ubuntu4
qemu-kvm                       1:2.10+dfsg-0ubuntu1

Note - be careful if you mean upstream qemu/libvirt (as the bug was filed) or the packages in Ubuntu (I added tasks for these).
I try to spot both, but will be more on-track for the latter.

I spent some time today trying to modify the ocata xml templates produced in /etc/libvirt/qemu/<INSTANCE-ID>.xml for each of the generated instances.  Destroying & Undefining the existing instance, making some changes and redefining the xml, however it appears that nova regenerates these templates upon instance reset, thus removing any changes made to the xml template. Is there any way to disable this feature that does not require some serious modifications to the nova underpinnings? 

The problem is with virtio-mmio. 

https://bugzilla.redhat.com/show_bug.cgi?id=1422413

Instances launched with virtio-mmio on aarch64 will not get DHCP (will not have a nic)

xml with libvirt 2.5.0

    <interface type='bridge'>
      <mac address='fa:16:3e:af:95:2e'/>
      <source bridge='qbrb5abdeb0-a0'/>
      <target dev='tapb5abdeb0-a0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='virtio-mmio'/>

I have updated libvirt-daemon to 3.6.0 on a particular compute node - when an instance is booted now, the nic section of the virsh xml looks like this:

    <interface type='bridge'>
      <mac address='fa:16:3e:10:0e:22'/>
      <source bridge='qbr274809a0-dc'/>
      <target dev='tap274809a0-dc'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>

The instance then gets a NIC and is able to get DHCP and complete cloud-init successfully.


So,
libvirt knew about some change and picks the right default if you do not specify it.
But if you (Openstack in this case) specify virtio-mmio as type, then it fails - is that correct?

The patch in the referred RH-BZ is already in the qemu we have in Artful (and thereby Ocata).
So if that is supposed to be the issue there has to be a new one after that fix.

Also as I've shown in c#17 (yes it is long sorry) - virtio-mmio works with the bridge that libvirt is usually creating - again my assumption is that it is somehow related to how this bridge is created (openvswitch in your case I assume).

So is the real error "network fails when using virtio-mmio on openvswitch set up by openstack"?
I have no OVS around to quickly try something around that atm.

Would it be reasonable to teach Openstack to not define virtio-mmio in this case?
Libvirt will make the right default (hopefully also when driven by Openstack which sets some force options), and just work then.

Hi,
@admcleod as discussed on IRC I realized Ocata maps to Zesty so that is qemu 2.8.
Therefore the referred patch is released in Artful, but not in Zesty.

I tagged up the bug tasks and provide a fix in [1] to test.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2995

To further reinforce what Christian said, the Newton cloud-archive also uses virtio-mmio for its address type.  (See my comment#15)

Newton XML:

    <interface type='bridge'>
      <mac address='fa:16:3e:8e:fc:48'/>
      <source bridge='qbrb5d335fc-46'/>
      <target dev='tapb5d335fc-46'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='virtio-mmio'/>
    </interface>

but we have proven this works with Newton.  Is it fair to say this could be attributed to a change in OVS, between Newton and Ocata?  



This appears to be:  https://bugzilla.redhat.com/show_bug.cgi?id=1422413

Yeah, the offending patch in RH-BZ 1422413 appeared in qemu 2.8.
So it would make sense to work with newtwon (2.6.1), and pike (2.10), but fail on Ocata (2.8).

I checked the ppa, in general it seems to work for me, so I'm now waiting for your verification if that really addresses the issue you are facing.
If that would be the case I can pass it through regression tests and afterwards to SRU (which currently holds another update that has to clear before doing so)

Since we wait for the zesty verification I'm setting the status to incomplete for now.

Since the fix mentioned in the previous comments is already in upstream QEMU, I'm setting the upstream status to "Fix released".

I've tested with the packages from the ppa:

https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2995


qemu:
  Installed: 1:2.8+dfsg-3ubuntu2.7~ppa5cloud
qemu-system-arm:
  Installed: 1:2.8+dfsg-3ubuntu2.7~ppa5cloud
qemu-system-aarch64:
  Installed: 1:2.8+dfsg-3ubuntu2.7~ppa5cloud


Rebooted the instance and it aquired an IP address and booted. 


more info, virsh dumpxml excerpt:

    <interface type='bridge'>
      <mac address='fa:16:3e:4e:1f:8f'/>
      <source bridge='qbrd542e755-45'/>
      <target dev='tapd542e755-45'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='virtio-mmio'/>

will test these and report back shortly.

I've testing with the same packages listed in comment #28,  Confirmed that this now works..  

See attached log

Ok, driving that into an SRU then - thanks for verifying.

Regression tests on the ppa are good as well, we need to double check the auto-run on proposed then to ensure this doesn't behave differently on other arches.
I cleaned up the changelog and UCA backport noise and made a proper SRU for zesty.

Note: there is currently another SRU in flight (already in verified state for 2 days), so acceptance from zesty-unapproved likely has to wait a bit until the former one clears completely

I accepted this but Launchpad timed out when the SRU testing comment was being added.

Thanks for the FYI Brian.
I see it in pending-SRUs as it should be.
I added the -needed tags to be "correct".

@Andrew/Sean - could you test proposed and set verified then (assuming it works like the ppa did)


Odd - this really seems to hit everything, not only LP timeouts.
There are also dep8 regressions listed on systemd which make no sense in relation to the fix.
The test history on arm is borked since February [1], so I ask you to override and ignore that.
On s390x it at least works sometimes, but the log [2] doesn't look much better, but there at least it hits a different issue than before - I'm retriggering this for now - but likely ask to ignore that as well - but I'll take a look at the retry first.


[1]: http://autopkgtest.ubuntu.com/packages/s/systemd/zesty/armhf
[2]: http://autopkgtest.ubuntu.com/packages/s/systemd/zesty/s390x

As assumed s390x passed now (so a flaky test), and as outlined before armhf we just have to give up.
Looking at the history an override might be the right thing to do.

Other than that all looks good, waiting for the verify by Sean/Andrew now.

Ok, Zesty actually had an override for the arm test (had to learn the placement of those for SRUs).
So only up to the verification now.

Hello Sean, or anyone else affected,

Accepted qemu into ocata-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ocata-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ocata-needed to verification-ocata-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ocata-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Thanks Christian - I've now verified this. I took a stepwise approach:

1) We originally observed this issue w/ the ocata cloud archive on xenial, so I redeployed that. I verified that I was still seeing the problem. I then created a PPA[*] w/ an arm64 build of QEMU from the ocata-staging PPA, which is a backport of the zesty-proposed package, and upgraded my nova-compute nodes to this version. I rebooted my test guests, and the problem was resolved.

2) I then updated my sources.list to point to zesty (w/ proposed enabled), and upgraded qemu-system-arm. This way I could test the actual build in zesty-proposed, as opposed to my backport. This continued to work.

3) Finally, I dist-upgraded this system from xenial to zesty - so that I'm actually testing the zesty build in a zesty environment, and rebooted. Still worked :)

[*] https://launchpad.net/~dannf/+archive/ubuntu/lp1719196

This bug was fixed in the package qemu - 1:2.8+dfsg-3ubuntu2.7

---------------
qemu (1:2.8+dfsg-3ubuntu2.7) zesty; urgency=medium

  * d/p/ubuntu/virtio-Fix-no-interrupt-when-not-creating-msi-contro.patch:
    on Arm fix no interrupt when not creating msi controller. That fixes
    broken networking if running with virtio-mmio only (LP: #1719196).

 -- Christian Ehrhardt <email address hidden>  Wed, 18 Oct 2017 16:17:34 +0200

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates.  Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report.  In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.



Regression testing has passed successfully.

zesty-ocata-proposed with stable charms:

======
Totals
======
Ran: 102 tests in 1897.0150 sec.
 - Passed: 93
 - Skipped: 9
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 1011.5607 sec.

zesty-ocata-proposed with dev charms:

======
Totals
======
Ran: 102 tests in 1933.5299 sec.
 - Passed: 93
 - Skipped: 9
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 963.0546 sec.

xenial-ocata-proposed with stable charms:

======
Totals
======
Ran: 102 tests in 1767.3787 sec.
 - Passed: 93
 - Skipped: 9
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 906.2188 sec.

xenial-ocata-proposed with dev charms:

======
Totals
======
Ran: 102 tests in 2051.1377 sec.
 - Passed: 93
 - Skipped: 9
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 998.9001 sec.