Skip to content
This repository has been archived by the owner on Apr 12, 2021. It is now read-only.

conjure-up+MAAS(multidomain)+ubuntu18.04 failed on nova-cloud-controller (maybe DNS issue?) #1494

Open
3 tasks
laralar opened this issue Jul 10, 2018 · 9 comments
Assignees
Labels

Comments

@laralar
Copy link

laralar commented Jul 10, 2018

Report

Thank you for trying conjure-up! Before reporting a bug please make sure you've gone through this checklist:

Please provide the output of the following commands

conjure-up.tar.gz

sosreport-os-client-20180710142710.tar.zip

which juju
/snap/bin/juju
juju version
2.4.0-bionic-amd64
which conjure-up
/snap/bin/conjure-up
conjure-up --version
conjure-up 2.6.0

which lxc #NOT INSTALLED NOT CONFIGURED
/snap/bin/lxc config show
/snap/bin/lxc version

cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"

Please attach tarball of ~/.cache/conjure-up:

tar cvzf conjure-up.tar.gz ~/.cache/conjure-up

Sosreport

Please attach a sosreport:

sudo apt install sosreport
sosreport

The resulting output file can be attached to this issue.

What Spell was Selected?

openstack-base

What provider (aws, maas, localhost, etc)?

maas

MAAS Users

Which version of MAAS?
MAAS version: 2.4.0~beta2 (6865-gec43e47e6-0ubuntu1)

Commands ran

conjure-up --apt-proxy http://apt-cacher.aibl.lan:3142 --apt-https-proxy http://apt-cacher.aibl.lan:3142
Please outline what commands were run to install and execute conjure-up:

Additional Information

I am running on a similar issue as
#487

Everything works fine except nova-cloud-controller/0* error idle 2/lxd/2 10.3.5.101 8774/tcp,8778/tcp hook failed: "cloud-compute-relation-changed"

aibladmin@os-client:~$ juju status
Model                       Controller                     Cloud/Region    Version  SLA          Timestamp
conjure-openstack-base-93d  conjure-up-cloud-maas-43d-4fe  cloud-maas-43d  2.4.0    unsupported  14:17:19+05:30

App                    Version        Status   Scale  Charm                  Store       Rev  OS      Notes
ceph-mon               12.2.4         active       3  ceph-mon               jujucharms   25  ubuntu
ceph-osd               12.2.4         active       3  ceph-osd               jujucharms  262  ubuntu
ceph-radosgw           12.2.4         active       1  ceph-radosgw           jujucharms  258  ubuntu
cinder                 12.0.1         active       1  cinder                 jujucharms  272  ubuntu
cinder-ceph            12.0.1         active       1  cinder-ceph            jujucharms  233  ubuntu
glance                 16.0.1         active       1  glance                 jujucharms  265  ubuntu
keystone               13.0.0         active       1  keystone               jujucharms  281  ubuntu
mysql                  5.7.20-29.24   active       1  percona-cluster        jujucharms  266  ubuntu
neutron-api            12.0.2         active       1  neutron-api            jujucharms  260  ubuntu
neutron-gateway        12.0.2         waiting      1  neutron-gateway        jujucharms  252  ubuntu
neutron-openvswitch    12.0.2         active       3  neutron-openvswitch    jujucharms  250  ubuntu
nova-cloud-controller  17.0.4         error        1  nova-cloud-controller  jujucharms  310  ubuntu
nova-compute           17.0.4         active       3  nova-compute           jujucharms  284  ubuntu
ntp                    4.2.8p10+dfsg  active       4  ntp                    jujucharms   24  ubuntu
openstack-dashboard    13.0.0         active       1  openstack-dashboard    jujucharms  259  ubuntu
rabbitmq-server        3.6.10         active       1  rabbitmq-server        jujucharms   74  ubuntu

Unit                      Workload  Agent  Machine  Public address  Ports              Message
ceph-mon/0*               active    idle   1/lxd/0  10.3.5.99                          Unit is ready and clustered
ceph-mon/1                active    idle   2/lxd/0  10.3.5.95                          Unit is ready and clustered
ceph-mon/2                active    idle   3/lxd/0  10.3.5.98                          Unit is ready and clustered
ceph-osd/0*               active    idle   1        10.3.5.88                          Unit is ready (1 OSD)
ceph-osd/1                active    idle   2        10.3.5.89                          Unit is ready (1 OSD)
ceph-osd/2                active    idle   3        10.3.5.90                          Unit is ready (1 OSD)
ceph-radosgw/0*           active    idle   0/lxd/0  10.3.5.92       80/tcp             Unit is ready
cinder/0*                 active    idle   1/lxd/1  10.3.5.94       8776/tcp           Unit is ready
  cinder-ceph/0*          active    idle            10.3.5.94                          Unit is ready
glance/0*                 active    idle   2/lxd/1  10.3.5.102      9292/tcp           Unit is ready
keystone/0*               active    idle   3/lxd/1  10.3.5.97       5000/tcp           Unit is ready
mysql/0*                  active    idle   0/lxd/1  10.3.5.91       3306/tcp           Unit is ready
neutron-api/0*            active    idle   1/lxd/2  10.3.5.100      9696/tcp           Unit is ready
neutron-gateway/0*        waiting   idle   0        10.3.4.111                         Incomplete relations: network-service
  ntp/1                   active    idle            10.3.4.111      123/udp            Ready
nova-cloud-controller/0*  error     idle   2/lxd/2  10.3.5.101      8774/tcp,8778/tcp  hook failed: "cloud-compute-relation-changed"
nova-compute/0*           active    idle   1        10.3.5.88                          Unit is ready
  neutron-openvswitch/1   active    idle            10.3.5.88                          Unit is ready
  ntp/2                   active    idle            10.3.5.88       123/udp            Ready
nova-compute/1            active    idle   2        10.3.5.89                          Unit is ready
  neutron-openvswitch/0*  active    idle            10.3.5.89                          Unit is ready
  ntp/0*                  active    idle            10.3.5.89       123/udp            Ready
nova-compute/2            active    idle   3        10.3.5.90                          Unit is ready
  neutron-openvswitch/2   active    idle            10.3.5.90                          Unit is ready
  ntp/3                   active    idle            10.3.5.90       123/udp            Ready
openstack-dashboard/0*    active    idle   3/lxd/2  10.3.5.96       80/tcp,443/tcp     Unit is ready
rabbitmq-server/0*        active    idle   0/lxd/2  10.3.5.93       5672/tcp           Unit is ready

Machine  State    DNS         Inst id              Series  AZ  Message
0        started  10.3.4.111  dbaqme               bionic  cs  Deployed
0/lxd/0  started  10.3.5.92   juju-5ccb90-0-lxd-0  bionic  cs  Container started
0/lxd/1  started  10.3.5.91   juju-5ccb90-0-lxd-1  bionic  cs  Container started
0/lxd/2  started  10.3.5.93   juju-5ccb90-0-lxd-2  bionic  cs  Container started
1        started  10.3.5.88   p6qqfe               bionic  cs  Deployed
1/lxd/0  started  10.3.5.99   juju-5ccb90-1-lxd-0  bionic  cs  Container started
1/lxd/1  started  10.3.5.94   juju-5ccb90-1-lxd-1  bionic  cs  Container started
1/lxd/2  started  10.3.5.100  juju-5ccb90-1-lxd-2  bionic  cs  Container started
2        started  10.3.5.89   wfreda               bionic  cs  Deployed
2/lxd/0  started  10.3.5.95   juju-5ccb90-2-lxd-0  bionic  cs  Container started
2/lxd/1  started  10.3.5.102  juju-5ccb90-2-lxd-1  bionic  cs  Container started
2/lxd/2  started  10.3.5.101  juju-5ccb90-2-lxd-2  bionic  cs  Container started
3        started  10.3.5.90   wdtry8               bionic  cs  Deployed
3/lxd/0  started  10.3.5.98   juju-5ccb90-3-lxd-0  bionic  cs  Container started
3/lxd/1  started  10.3.5.97   juju-5ccb90-3-lxd-1  bionic  cs  Container started
3/lxd/2  started  10.3.5.96   juju-5ccb90-3-lxd-2  bionic  cs  Container started

When running this command:
juju debug-log --replay -i unit-nova-cloud-controller-0

I can see

unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.juju-log cloud-compute:27: OpenStack release, database, or rabbitmq not ready for Cells V2
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed Traceback (most recent call last):
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1183, in <module>
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     main()
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1176, in main
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     hooks.execute(sys.argv)
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/core/hookenv.py", line 823, in execute
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     self._hooks[hook_name]()
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 671, in compute_changed
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     ssh_compute_add(key, rid=rid, unit=unit)
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/nova_cc_utils.py", line 1005, in ssh_compute_add
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     if ns_query(short):
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 478, in ns_query
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     answers = dns.resolver.query(address, rtype)
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 1132, in query
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise_on_no_answer, source_port)
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 947, in query
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise NoNameservers(request=request, errors=errors)
unit-nova-cloud-controller-0: 14:28:54 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed dns.resolver.NoNameservers: All nameservers failed to answer the query node16. IN A: Server 127.0.0.53 UDP port 53 answered SERVFAIL
unit-nova-cloud-controller-0: 14:28:54 ERROR juju.worker.uniter.operation hook "cloud-compute-relation-changed" failed: exit status 1
unit-nova-cloud-controller-0: 14:28:54 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-nova-cloud-controller-0: 14:29:52 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook

The name couldn't be resolved.

the strange thing is that in DHCP MAAS LXD containers I can resolve the name. Only in certain juju containers I can't

for example, juju ssh 0/lxd/0 resolves fine

ubuntu@juju-5ccb90-0-lxd-0:~$ ping node16
PING node16.aibl.lan (10.3.5.89) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.5.89): icmp_seq=1 ttl=64 time=0.104 ms
64 bytes from node16.aibl.lan (10.3.5.89): icmp_seq=2 ttl=64 time=0.170 ms
^C
ubuntu@juju-5ccb90-0-lxd-0:~$ cat /etc/netplan/99-juju.yaml
network:
  version: 2
  ethernets:
    eth0:
      match:
        macaddress: 00:16:3e:f0:2f:47
      addresses:
      - 10.3.5.92/23
      gateway4: 10.3.4.254
      nameservers:
        search: [aibl.lan]
        addresses: [10.3.4.10]
      routes:
      - to: 192.168.4.0/24
        via: 10.3.4.3
        metric: 0

ubuntu@juju-5ccb90-0-lxd-0:~$

but the nova-cloud-controller container can't

ubuntu@juju-5ccb90-2-lxd-2:~$ cat /etc/netplan/99-juju.yaml
network:
  version: 2
  ethernets:
    eth0:
      match:
        macaddress: 00:16:3e:71:fd:82
      addresses:
      - 10.3.5.101/23
      gateway4: 10.3.4.254
      nameservers:
        addresses: [10.3.4.10]
      routes:
      - to: 192.168.4.0/24
        via: 10.3.4.3
        metric: 0

ubuntu@juju-5ccb90-2-lxd-2:~$ ping node16
ping: node16: Temporary failure in name resolution
ubuntu@juju-5ccb90-2-lxd-2:~$ ping node16.aibl.lan
PING node16.aibl.lan (10.3.5.89) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.5.89): icmp_seq=1 ttl=64 time=0.109 ms
^C
--- node16.aibl.lan ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.109/0.109/0.109/0.000 ms
ubuntu@juju-5ccb90-2-lxd-2:~$

Note that the 0/lxd/0 has search: [aibl.lan] but nova-cloud-controller doesn't

In MAAS I have many domains, aibl.lan is the primary domain, and DHCP assigned containers resolve just fine the name that is not solved in the juju nova-cloud-controller LXD container.

Thanks

@laralar
Copy link
Author

laralar commented Jul 10, 2018

I executed a second time the whole thing. the 0/lxd/0 containers didn't have the search: [aibl.lan]

Anyway, I logged in into 2/lxd/2 nova-cloud-controller, and manually edited the netplan file
I added search: [aibl.lan]

ubuntu@juju-c6ff2e-2-lxd-2:~$ cat /etc/netplan/99-juju.yaml network: version: 2 ethernets: eth0: match: macaddress: 00:16:3e:d1:79:d7 addresses: - 10.3.5.116/23 gateway4: 10.3.4.254 nameservers: search: [aibl.lan] addresses: [10.3.4.10] routes: - to: 192.168.4.0/24 via: 10.3.4.3 metric: 0

And everything finished successfully.

So the question is why is this nod being done automatically, is it a MAAS error?

WHat I saw is that juju containers are deployed with IP Auto from MAAS, not with DHCP.. if they were deployed with DHCP they wouldn't have any issue, since DHCP adds the search domain name, but IP Static, or IP Auto don't

@laralar
Copy link
Author

laralar commented Jul 10, 2018

Also.. I get the message
☑ Horizon Login to Horizon: http://10.3.5.119/horizon l: admin p: openstack

however, when I go to that webpage, I have a field Domain Name, where if I leave it empty, fails, if I put aibl.lan fails, and if I put admin_domain as I read somewhere else it also fails.

How to login into horizon?

@laralar
Copy link
Author

laralar commented Jul 10, 2018

got the password with juju run --unit keystone/0 leader-get admin_passwd as stated where I logged in. HOwever.. why is this not in the documentation, why is this password created instead of what the documentation says "openstack" ? Also.. the last screen of conjure-up says the username and password is admin/openstack and doesn't mention anything about the domain

@adam-stokes
Copy link
Contributor

This was with the openstack novakvm spell? I think It needs to be updated so the summary shows the domain section like it does here: https://github.com/conjure-up/spells/blob/master/openstack-novalxd/steps/04_horizon/after-deploy#L7

I'll get that updated, and see if I can get some juju guys to look into the maas issue

@adam-stokes adam-stokes self-assigned this Jul 10, 2018
@laralar
Copy link
Author

laralar commented Jul 11, 2018

  1. Correct.. I selected novaKVM (which I think is openstck-base).

  2. not only the summary. what about the DNS.. I have to manually ssh into the juju contianer, update the netplan configuration file with the search field, execute netplan apply, all of this before it shows the error. and then it succeeds

@adam-stokes
Copy link
Contributor

@laralar thanks, im going to spend time this week to dig into this spell and get it working.

@babarzahoor
Copy link

I am also facing issue in OpenStack novaKVM Space, any updates regarding fix of this bug?

@babarzahoor
Copy link

*spell = Openstack-base novaKVM

@babarzahoor
Copy link

I did a small hack to fix this issue during the installation of OpenStack with conjure-up

juju ssh nova-cloud-controller/0
sudo bash
echo "nameserver IP Of MAAS" > /etc/resolv.conf

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants